New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread hung in txg_wait_open() forever in D state #3064
Comments
|
This was determined to be caused by a dmu_tx_hold_free() reserving a large amount of memory for the ARC in a single TX. This resulted in the TX never being able to be assigned and the thread looping dmu_tx_assign(). Further details are available in the Lustre Jira issue. https://jira.hpdd.intel.com/browse/LU-5242 There may need to be some work done in dmu_tx_hold_free() to reduce this worst case estimate which caused the issue. This needs to be investigated. |
|
In arc_tempreserve_space() where ERESTART was returned: The condition would test true only when "arc_no_grow || reserve > arc_c_max". In either case, I don't think ERESTART would be the correct return code. A retry would NOT make any difference unless the admin would show up and bump zfs_arc_max to be over reserve. In the Lustre test, the TX was retried more than 1012288401 times to no avail. So I'd believe in this case an error should be returned to the caller of dmu_tx_assign(). |
|
It seemed dmu_tx_count_free() estimated about 1G memory overhead for this sparse object: There's only 1 L1 block, and my understanding is that there can't be more level N blocks than level N-1 blocks, so we know at most there could be DN_MAX_LEVELS indirect blocks without scanning them all. The 1G overhead is way over estimate. Also, once the TX was delayed, the TXGs began to move very fast forward at about 5141 TXGs per second: |
|
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
We run into this hung quite often, with SPL/ZFS v0.6.3-1.2 (DEBUG mode). A thread hung in txg_wait_open() in D state and would never recover, and it seemed that the thread was actually running in D state, rather than sleeping - the kernel hung task watcher never warned about it, and top showed its CPU time growing.
The txg_sync/txg_quiesce threads seemed OK, busy alternating between S/D and R states. But the pool state seemed quite screwed up. The TXG # increased by 2186 in just 1 second:
The read/write tests were able to go on OK, i.e. no later calls to dmu_tx_assign() would hang, until later when we tried to umount the dataset, when it just hung as well.
This is easily reproducible and I have a crashdump available. Please let me know if any debug information is needed.
The text was updated successfully, but these errors were encountered: