making sure the last quiesced txg is synced #8239

xwcal · 2019-01-05T02:37:12Z

Motivation and Context

Consider this scenario (see txg.c ):
There is heavy write load when the pool exports.
After txg_sync_stop's call of txg_wait_synced returns, many more txgs get processed, but right before txg_sync_stop gets tx_sync_lock, the following happens:

txg_sync_thread begins waiting on tx_sync_more_cv.
txg_quiesce_thread gets done with txg_quiesce(dp, txg).
txg_sync_stop gets tx_sync_lock first, calls cv_broadcasts with tx_exiting == 1, and waits for exits.
txg_sync_thread wakes up first and exits.
Finally, txg_quiesce_thread gets tx_sync_lock, and calls cv_broadcast(&tx->tx_sync_more_cv),
but txg_sync_thread is already gone, and the txg in txg_quiesce(dp, txg) above never gets synced.

Description

txg_sync_thread now waits for txg_quiesce_thread to exit and maybe run one more sync before exiting.

How Has This Been Tested?

Did not test.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

Fixed a potential bug as described in openzfs#8233: Consider this scenario (see [txg.c](https://github.com/zfsonlinux/zfs/blob/06f3fc2a4b097545259935d54634c5c6f49ed20f/module/zfs/txg.c) ): There is heavy write load when the pool exports. After `txg_sync_stop`'s call of `txg_wait_synced` returns, many more txgs get processed, but right before` txg_sync_stop` gets `tx_sync_lock`, the following happens: - `txg_sync_thread` begins waiting on `tx_sync_more_cv`. - `txg_quiesce_thread` gets done with `txg_quiesce(dp, txg)`. - `txg_sync_stop` gets `tx_sync_lock` first, calls `cv_broadcast`s with `tx_exiting` == 1, and waits for exits. - `txg_sync_thread` wakes up first and exits. - Finally, `txg_quiesce_thread` gets `tx_sync_lock`, and calls `cv_broadcast(&tx->tx_sync_more_cv)`, but `txg_sync_thread` is already gone, and the txg in `txg_quiesce(dp, txg)` above never gets synced. Signed-off-by: Leap Second <leapsecond@protonmail.com>

codecov · 2019-01-05T10:51:41Z

Codecov Report

Merging #8239 into master will decrease coverage by 0.11%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #8239      +/-   ##
==========================================
- Coverage   78.57%   78.45%   -0.12%     
==========================================
  Files         379      379              
  Lines      114924   114927       +3     
==========================================
- Hits        90299    90166     -133     
- Misses      24625    24761     +136

Flag	Coverage Δ
#kernel	`78.98% <100%> (ø)`	⬆️
#user	`67.2% <100%> (-0.19%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b8e441...025861a. Read the comment docs.

Fixed checkstyle complaints: ./module/zfs/txg.c: 558: line > 80 characters ./module/zfs/txg.c: 562: line > 80 characters Signed-off-by: Leap Second <leapsecond@protonmail.com>

Addressed checkstype complaints: ./module/zfs/txg.c: 559: continuation should be indented 4 spaces ./module/zfs/txg.c: 564: continuation should be indented 4 spaces Signed-off-by: Leap Second <leapsecond@protonmail.com>

Addressed checkstyle complaints: ./module/zfs/txg.c: 559: spaces instead of tabs ./module/zfs/txg.c: 559: continuation should be indented 4 spaces ./module/zfs/txg.c: 564: spaces instead of tabs ./module/zfs/txg.c: 564: continuation should be indented 4 spaces Signed-off-by: Leap Second <leapsecond@protonmail.com>

xwcal · 2019-01-05T12:55:51Z

Kernel.org Built-in x86_64 (BUILD) keeps failing with the following error in the make log:

In file included from ./include/zfs/spl/sys/condvar.h:33:0,
from ./include/zfs/sys/zfs_context.h:38,
from ./include/zfs/sys/crypto/common.h:39,
from fs/zfs/icp/illumos-crypto.c:35:
./include/zfs/spl/sys/time.h: In function 'gethrestime':
./include/zfs/spl/sys/time.h:76:8: error: implicit declaration of function 'current_kernel_time64'; did you mean 'core_kernel_text'? [-Werror=implicit-function-declaration]
*ts = current_kernel_time64();
^~~~~~~~~~~~~~~~~~~~~
core_kernel_text
./include/zfs/spl/sys/time.h:76:6: error: incompatible types when assigning to type 'inode_timespec_t {aka struct timespec64}' from type 'int'
*ts = current_kernel_time64();
^
./include/zfs/spl/sys/time.h: In function 'gethrestime_sec':
./include/zfs/spl/sys/time.h:86:24: error: invalid initializer
inode_timespec_t ts = current_kernel_time64();
^~~~~~~~~~~~~~~~~~~~~
CC drivers/acpi/acpica/utresdecode.o
cc1: some warnings being treated as errors
make[3]: *** [fs/zfs/icp/illumos-crypto.o] Error 1
make[2]: *** [fs/zfs/icp] Error 2
make[1]: *** [fs/zfs] Error 2
make: *** [fs] Error 2

Any thoughts why?

EDIT:
Nvm, existing issue:
http://build.zfsonlinux.org/builders/Kernel.org%20Built-in%20x86_64%20%28BUILD%29/builds/24439
8501452
maybe relevant:
https://lkml.org/lkml/2018/12/17/110
https://lkml.org/lkml/2018/12/7/405
and
https://www.kernel.org/doc/html/latest/core-api/timekeeping.html#c.current_kernel_time64

These are replaced by ktime_get_coarse_real_ts64() and ktime_get_coarse_ts64(). However, A lot of code that wants coarse-grained times can use the simple ‘jiffies’ instead, while some drivers may actually want the higher resolution accessors these days.

xwcal · 2019-01-06T03:45:31Z

Since this happened to my fork instead of the master, I am not sure about starting a new issue.
But since this could come up again I thought I might as well document it:

I was making trivial edits to please checkstyle (hope I didn't rival others on a early saturday morning ;) ). After three successful runs on Ubuntu 18.04 x86_64 (TEST), the fourth one failed with

command timed out: 4200 seconds without output running ['runurl', 'https://raw.githubusercontent.com/zfsonlinux/zfs-buildbot/master/scripts/bb-test-zfstests.sh'], attempting to kill

success
success
success
fail

Here are some snippets from relevant logs:

Last few lines from 7.2.tests of the failed run:

Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_005_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_006_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_007_pos (run as root) [00:01] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_008_pos (run as root) [00:00] [PASS]
Terminated

The last fews lines from 7.3.log:

Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_008_pos (run as root) [00:00] [PASS]
16:45:05.51 ASSERTION: Verify that destroying snapshots returns space to the pool.
16:45:05.52 NOTE: Populate the /mnt/testdir directory
16:45:05.52 SUCCESS: file_write -o create -f /mnt/testdir/file1 -b 8192 -c 20 -d 1
16:45:05.56 SUCCESS: zfs snapshot testpool/testfs@testsnap.1
16:45:05.57 SUCCESS: file_write -o create -f /mnt/testdir/file2 -b 8192 -c 20 -d 2
16:45:05.60 SUCCESS: zfs snapshot testpool/testfs@testsnap.2
16:45:05.61 SUCCESS: file_write -o create -f /mnt/testdir/file3 -b 8192 -c 20 -d 3
16:45:05.64 SUCCESS: zfs snapshot testpool/testfs@testsnap.3
16:45:05.64 SUCCESS: file_write -o create -f /mnt/testdir/file4 -b 8192 -c 20 -d 4
16:45:05.67 SUCCESS: zfs snapshot testpool/testfs@testsnap.4
16:45:05.67 SUCCESS: file_write -o create -f /mnt/testdir/file5 -b 8192 -c 20 -d 5
16:45:05.70 SUCCESS: zfs snapshot testpool/testfs@testsnap.5
16:45:05.70 SUCCESS: file_write -o create -f /mnt/testdir/file6 -b 8192 -c 20 -d 6
16:45:05.73 SUCCESS: zfs snapshot testpool/testfs@testsnap.6
16:45:05.74 SUCCESS: file_write -o create -f /mnt/testdir/file7 -b 8192 -c 20 -d 7
16:45:05.76 SUCCESS: zfs snapshot testpool/testfs@testsnap.7
16:45:05.77 SUCCESS: file_write -o create -f /mnt/testdir/file8 -b 8192 -c 20 -d 8
16:45:05.80 SUCCESS: zfs snapshot testpool/testfs@testsnap.8
16:45:05.80 SUCCESS: file_write -o create -f /mnt/testdir/file9 -b 8192 -c 20 -d 9
16:45:05.83 SUCCESS: zfs snapshot testpool/testfs@testsnap.9
16:45:05.86 SUCCESS: zfs destroy testpool/testfs@testsnap.1
16:45:05.89 SUCCESS: zfs destroy testpool/testfs@testsnap.2
16:45:05.92 SUCCESS: zfs destroy testpool/testfs@testsnap.3
16:45:05.95 SUCCESS: zfs destroy testpool/testfs@testsnap.4
16:45:05.98 SUCCESS: zfs destroy testpool/testfs@testsnap.5
16:45:06.02 SUCCESS: zfs destroy testpool/testfs@testsnap.6
16:45:06.05 SUCCESS: zfs destroy testpool/testfs@testsnap.7
16:45:06.08 SUCCESS: zfs destroy testpool/testfs@testsnap.8
16:45:06.11 SUCCESS: zfs destroy testpool/testfs@testsnap.9
16:45:06.15 SUCCESS: zpool sync testpool
16:45:06.15 After destroying snapshots, the space is returned to the pool.
16:45:06.15 NOTE: Performing local cleanup via log_onexit (cleanup)

And the relevant part in 7.4.console:

[13714.051945] WARNING: Unable to automount /mnt/testdir/.zfs/snapshot/testsnap.8/testpool/testfs@testsnap.8: 256
[13714.147827] WARNING: Unable to automount /mnt/testdir/.zfs/snapshot/testsnap.9/testpool/testfs@testsnap.9: 256
[13714.520884] WARNING: Unable to automount /mnt/testdir1/.zfs/snapshot/testsnap/testpool/testctr/testfs1@testsnap: 256
[13715.044997] WARNING: Unable to automount /mnt/testdir1/.zfs/snapshot/testsnap/testpool/testctr/testfs1@testsnap: 256
[13717.056079] VERIFY(txg_list_empty(&dp->dp_dirty_dirs, txg)) failed
[13717.061400] PANIC at spa.c:7869:spa_sync()
[13717.064799] Showing stack for process 10853
[13717.064801] CPU: 1 PID: 10853 Comm: txg_sync Tainted: P OE 4.15.0-1007-aws #7-Ubuntu
[13717.064802] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[13717.064803] Call Trace:
[13717.064812] dump_stack+0x63/0x8b
[13717.064824] spl_dumpstack+0x29/0x30 [spl]
[13717.064830] spl_panic+0xc8/0x110 [spl]
[13717.064835] ? _wake_up_common_lock+0x8e/0xc0
[13717.064928] ? queued_spin_unlock+0xb/0x10 [zfs]
[13717.064997] ? queued_spin_unlock+0xb/0x10 [zfs]
[13717.065064] spa_sync+0x1042/0x10b0 [zfs]
[13717.065136] txg_sync_thread+0x2ef/0x4a0 [zfs]
[13717.065205] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[13717.065214] thread_generic_wrapper+0x7d/0xc0 [spl]
[13717.065218] kthread+0x121/0x140
[13717.065225] ? IS_ERR+0x10/0x10 [spl]
[13717.065227] ? kthread_create_worker_on_cpu+0x70/0x70
[13717.065231] ? do_syscall_64+0x73/0x130
[13717.065234] ? SyS_exit_group+0x14/0x20
[13717.065237] ret_from_fork+0x35/0x40
[13897.124668] INFO: task kworker/u30:0:32082 blocked for more than 120 seconds.
[13897.129283] Tainted: P OE 4.15.0-1007-aws #7-Ubuntu
[13897.134130] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13897.139437] kworker/u30:0 D 0 32082 2 0x80000000
[13897.139446] Workqueue: events_freezable_power disk_events_workfn

Looks like zfs destroyin cleanup triggered the panic. But I haven't learned enough of zfs to figure out what is the cause.

@behlendorf can you take a look when you got time?

module/zfs/txg.c

behlendorf · 2019-01-08T00:01:49Z

@seekfirstleapsecond thanks for opening the PR and looking in to the failures. Don't worry about the kernel.org failures, they appear to be due to recent changes to an unreleased kernel and will need to be investigated independently. As for that last Ubuntu 18.04 x86_64 (TEST) failure, it's a known issue which is fairly hard to reproduce and we occasionally see when running the ZTS. It needs to be investigated and resolved.

Signed-off-by: Leap Second <leapsecond@protonmail.com>

module/zfs/txg.c

c0d3z3r0 · 2019-08-19T22:23:40Z

ping

module/zfs/txg.c

xwcal added 3 commits January 5, 2019 04:10

style fixes

80f4270

Fixed checkstyle complaints: ./module/zfs/txg.c: 558: line > 80 characters ./module/zfs/txg.c: 562: line > 80 characters Signed-off-by: Leap Second <leapsecond@protonmail.com>

style fixes 2

e3070bb

Addressed checkstype complaints: ./module/zfs/txg.c: 559: continuation should be indented 4 spaces ./module/zfs/txg.c: 564: continuation should be indented 4 spaces Signed-off-by: Leap Second <leapsecond@protonmail.com>

behlendorf reviewed Jan 8, 2019

View reviewed changes

module/zfs/txg.c Show resolved Hide resolved

module/zfs/txg.c Outdated Show resolved Hide resolved

behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 8, 2019

xwcal added 2 commits January 7, 2019 17:37

simplified logic

c82de60

Signed-off-by: Leap Second <leapsecond@protonmail.com>

removed garbage

025861a

Signed-off-by: Leap Second <leapsecond@protonmail.com>

behlendorf approved these changes Jan 8, 2019

View reviewed changes

behlendorf requested a review from ahrens January 8, 2019 17:31

alek-p suggested changes Feb 25, 2019

View reviewed changes

module/zfs/txg.c Show resolved Hide resolved

behlendorf added the Status: Post 0.8.0 label Apr 19, 2019

behlendorf removed the Status: Post 0.8.0 label May 23, 2019

ahrens assigned mmaybee Jun 10, 2021

grwilson reviewed Nov 10, 2021

View reviewed changes

module/zfs/txg.c Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

making sure the last quiesced txg is synced #8239

making sure the last quiesced txg is synced #8239

xwcal commented Jan 5, 2019 •

edited

codecov bot commented Jan 5, 2019 •

edited

xwcal commented Jan 5, 2019 •

edited

xwcal commented Jan 6, 2019 •

edited

behlendorf commented Jan 8, 2019

c0d3z3r0 commented Aug 19, 2019

making sure the last quiesced txg is synced #8239

Are you sure you want to change the base?

making sure the last quiesced txg is synced #8239

Conversation

xwcal commented Jan 5, 2019 • edited

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Jan 5, 2019 • edited

Codecov Report

xwcal commented Jan 5, 2019 • edited

xwcal commented Jan 6, 2019 • edited

behlendorf commented Jan 8, 2019

c0d3z3r0 commented Aug 19, 2019

xwcal commented Jan 5, 2019 •

edited

codecov bot commented Jan 5, 2019 •

edited

xwcal commented Jan 5, 2019 •

edited

xwcal commented Jan 6, 2019 •

edited