New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing process hung at txg_wait_open #3645
Comments
|
@jxiong have you been able to reproduce this with the latest code? |
|
I only saw it once when I was doing it last time. I haven't tried it on latest baseline yet, I will do that. |
|
I've managed to run across something very similar. The workload was a parallel bulk load to Postgres on EC2, with Amazon Linux 2016.03. The ZFS version is 0.6.5.7-1.el6. A postmaster process was using 100% system CPU right before the instance was auto-restarted by monitoring, and that means that it was unable to accept connections for a couple of minutes consecutively. Interesting other information is it is preceded by an OOM condition, though this system has a |
|
Closing as stale. |
I saw this issue when I was benchmarking ZFS performance. An iozone thread has been hung for a long time. txg_sync process is busy waiting. Please check the backtrace of both processes.
{noformat}
[10459.403319] iozone D ffff88082f353440 0 6714 1 0x00000004
[10459.403321] ffff880085e83b28 0000000000000082 ffff88055b805180 ffff880085e83fd8
[10459.403323] 0000000000013440 0000000000013440 ffff880803713d20 ffff8807f3701368
[10459.403326] ffff8807f3701220 ffff8807f3701370 ffff8807f3701248 0000000000000000
[10459.403328] Call Trace:
[10459.403330] [] schedule+0x29/0x70
[10459.403334] [] cv_wait_common+0xe5/0x120 [spl]
[10459.403337] [] ? prepare_to_wait_event+0x100/0x100
[10459.403341] [] __cv_wait+0x15/0x20 [spl]
[10459.403360] [] txg_wait_open+0x83/0xd0 [zfs]
[10459.403379] [] dmu_tx_wait+0x380/0x390 [zfs]
[10459.403387] [] ? mutex_lock+0x12/0x2f
[10459.403406] [] dmu_tx_assign+0x9a/0x510 [zfs]
[10459.403423] [] dmu_free_long_range+0x18c/0x240 [zfs]
[10459.403449] [] zfs_rmnode+0x5d/0x340 [zfs]
[10459.403473] [] zfs_zinactive+0x168/0x180 [zfs]
[10459.403494] [] zfs_inactive+0x60/0x200 [zfs]
[10459.403518] [] zpl_evict_inode+0x43/0x60 [zfs]
[10459.403521] [] evict+0xb4/0x180
[10459.403523] [] iput+0xf5/0x180
[10459.403525] [] do_unlinkat+0x193/0x2c0
[10459.403528] [] ? SYSC_newstat+0x2f/0x40
[10459.403530] [] SyS_unlink+0x16/0x20
[10459.403532] [] system_call_fastpath+0x1a/0x1f
{noformat}
and txg_sync:
{noformat}
[16932.253158] txg_sync R running task 0 26338 2 0x00000000
[16932.253166] ffff8807b1c77c38 ffff8807f659a3e8 ffffc900086ad7f0 000000000ccd7dca
[16932.253177] ffffffffc09adb20 ffffffffc09adb20 0000000000000000 000000000000c210
[16932.253185] 0000000000001000 0000000000000000 0000000000000000 ffff8807b1c77c68
[16932.253187] Call Trace:
[16932.253190] [] ? __kmalloc_node+0x1c9/0x2a0
[16932.253193] [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
[16932.253197] [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
[16932.253201] [] ? __cv_wait_io+0x18/0x20 [spl]
[16932.253226] [] ? zio_wait+0x123/0x210 [zfs]
[16932.253240] [] ? ddt_get_dedup_stats+0x3a/0x60 [zfs]
[16932.253243] [] ? mod_timer+0x12a/0x1e0
[16932.253260] [] ? dsl_pool_sync+0xb1/0x470 [zfs]
[16932.253267] [] ? spl_kmem_free+0x2a/0x40 [spl]
[16932.253292] [] ? spa_update_dspace+0x26/0x40 [zfs]
[16932.253315] [] ? spa_sync+0x3a2/0xb20 [zfs]
[16932.253321] [] ? autoremove_wake_function+0x12/0x40
[16932.253327] [] ? read_tsc+0x9/0x20
[16932.253350] [] ? txg_sync_thread+0x36b/0x630 [zfs]
[16932.253357] [] ? sched_clock+0x9/0x10
[16932.253380] [] ? txg_quiesce_thread+0x380/0x380 [zfs]
[16932.253388] [] ? thread_generic_wrapper+0x71/0x80 [spl]
[16932.253395] [] ? __thread_exit+0x20/0x20 [spl]
[16932.253402] [] ? kthread+0xd2/0xf0
[16932.253408] [] ? kthread_create_on_node+0x1c0/0x1c0
[16932.253413] [] ? ret_from_fork+0x7c/0xb0
[16932.253415] [] ? kthread_create_on_node+0x1c0/0x1c0
{noformat}
The text was updated successfully, but these errors were encountered: