Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making sure the last quiesced txg is synced #8239

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

xwcal
Copy link

@xwcal xwcal commented Jan 5, 2019

Motivation and Context

See #8233

Consider this scenario (see txg.c ):
There is heavy write load when the pool exports.
After txg_sync_stop's call of txg_wait_synced returns, many more txgs get processed, but right before txg_sync_stop gets tx_sync_lock, the following happens:

  • txg_sync_thread begins waiting on tx_sync_more_cv.
  • txg_quiesce_thread gets done with txg_quiesce(dp, txg).
  • txg_sync_stop gets tx_sync_lock first, calls cv_broadcasts with tx_exiting == 1, and waits for exits.
  • txg_sync_thread wakes up first and exits.
  • Finally, txg_quiesce_thread gets tx_sync_lock, and calls cv_broadcast(&tx->tx_sync_more_cv),
    but txg_sync_thread is already gone, and the txg in txg_quiesce(dp, txg) above never gets synced.

Description

txg_sync_thread now waits for txg_quiesce_thread to exit and maybe run one more sync before exiting.

How Has This Been Tested?

Did not test.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

Fixed a potential bug as described in openzfs#8233:

Consider this scenario (see [txg.c](https://github.com/zfsonlinux/zfs/blob/06f3fc2a4b097545259935d54634c5c6f49ed20f/module/zfs/txg.c) ):
There is heavy write load when the pool exports.
After `txg_sync_stop`'s call of `txg_wait_synced` returns, many more txgs get processed, but right before` txg_sync_stop` gets `tx_sync_lock`, the following happens:

- `txg_sync_thread` begins waiting on `tx_sync_more_cv`.
- `txg_quiesce_thread` gets done with `txg_quiesce(dp, txg)`.
- `txg_sync_stop` gets `tx_sync_lock` first, calls `cv_broadcast`s with `tx_exiting` == 1, and waits for exits.
- `txg_sync_thread` wakes up first and exits.
- Finally, `txg_quiesce_thread` gets `tx_sync_lock`, and calls `cv_broadcast(&tx->tx_sync_more_cv)`, 
but `txg_sync_thread` is already gone, and the txg in `txg_quiesce(dp, txg)` above never gets synced.

Signed-off-by: Leap Second <leapsecond@protonmail.com>
@codecov
Copy link

codecov bot commented Jan 5, 2019

Codecov Report

Merging #8239 into master will decrease coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8239      +/-   ##
==========================================
- Coverage   78.57%   78.45%   -0.12%     
==========================================
  Files         379      379              
  Lines      114924   114927       +3     
==========================================
- Hits        90299    90166     -133     
- Misses      24625    24761     +136
Flag Coverage Δ
#kernel 78.98% <100%> (ø) ⬆️
#user 67.2% <100%> (-0.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b8e441...025861a. Read the comment docs.

Fixed checkstyle complaints:
./module/zfs/txg.c: 558: line > 80 characters
./module/zfs/txg.c: 562: line > 80 characters

Signed-off-by: Leap Second <leapsecond@protonmail.com>
Addressed checkstype complaints:
./module/zfs/txg.c: 559: continuation should be indented 4 spaces
./module/zfs/txg.c: 564: continuation should be indented 4 spaces

Signed-off-by: Leap Second <leapsecond@protonmail.com>
Addressed checkstyle complaints:
./module/zfs/txg.c: 559: spaces instead of tabs
./module/zfs/txg.c: 559: continuation should be indented 4 spaces
./module/zfs/txg.c: 564: spaces instead of tabs
./module/zfs/txg.c: 564: continuation should be indented 4 spaces

Signed-off-by: Leap Second <leapsecond@protonmail.com>
@xwcal
Copy link
Author

xwcal commented Jan 5, 2019

Kernel.org Built-in x86_64 (BUILD) keeps failing with the following error in the make log:

In file included from ./include/zfs/spl/sys/condvar.h:33:0,
from ./include/zfs/sys/zfs_context.h:38,
from ./include/zfs/sys/crypto/common.h:39,
from fs/zfs/icp/illumos-crypto.c:35:
./include/zfs/spl/sys/time.h: In function 'gethrestime':
./include/zfs/spl/sys/time.h:76:8: error: implicit declaration of function 'current_kernel_time64'; did you mean 'core_kernel_text'? [-Werror=implicit-function-declaration]
*ts = current_kernel_time64();
^~~~~~~~~~~~~~~~~~~~~
core_kernel_text
./include/zfs/spl/sys/time.h:76:6: error: incompatible types when assigning to type 'inode_timespec_t {aka struct timespec64}' from type 'int'
*ts = current_kernel_time64();
^
./include/zfs/spl/sys/time.h: In function 'gethrestime_sec':
./include/zfs/spl/sys/time.h:86:24: error: invalid initializer
inode_timespec_t ts = current_kernel_time64();
^~~~~~~~~~~~~~~~~~~~~
CC drivers/acpi/acpica/utresdecode.o
cc1: some warnings being treated as errors
make[3]: *** [fs/zfs/icp/illumos-crypto.o] Error 1
make[2]: *** [fs/zfs/icp] Error 2
make[1]: *** [fs/zfs] Error 2
make: *** [fs] Error 2

Any thoughts why?

EDIT:
Nvm, existing issue:
http://build.zfsonlinux.org/builders/Kernel.org%20Built-in%20x86_64%20%28BUILD%29/builds/24439
8501452
maybe relevant:
https://lkml.org/lkml/2018/12/17/110
https://lkml.org/lkml/2018/12/7/405
and
https://www.kernel.org/doc/html/latest/core-api/timekeeping.html#c.current_kernel_time64

These are replaced by ktime_get_coarse_real_ts64() and ktime_get_coarse_ts64(). However, A lot of code that wants coarse-grained times can use the simple ‘jiffies’ instead, while some drivers may actually want the higher resolution accessors these days.

@xwcal
Copy link
Author

xwcal commented Jan 6, 2019

Since this happened to my fork instead of the master, I am not sure about starting a new issue.
But since this could come up again I thought I might as well document it:

I was making trivial edits to please checkstyle (hope I didn't rival others on a early saturday morning ;) ). After three successful runs on Ubuntu 18.04 x86_64 (TEST), the fourth one failed with

command timed out: 4200 seconds without output running ['runurl', 'https://raw.githubusercontent.com/zfsonlinux/zfs-buildbot/master/scripts/bb-test-zfstests.sh'], attempting to kill

success
success
success
fail

Here are some snippets from relevant logs:

Last few lines from 7.2.tests of the failed run:

Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_005_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_006_pos (run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_007_pos (run as root) [00:01] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_008_pos (run as root) [00:00] [PASS]
Terminated

The last fews lines from 7.3.log:

Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_008_pos (run as root) [00:00] [PASS]
16:45:05.51 ASSERTION: Verify that destroying snapshots returns space to the pool.
16:45:05.52 NOTE: Populate the /mnt/testdir directory
16:45:05.52 SUCCESS: file_write -o create -f /mnt/testdir/file1 -b 8192 -c 20 -d 1
16:45:05.56 SUCCESS: zfs snapshot testpool/testfs@testsnap.1
16:45:05.57 SUCCESS: file_write -o create -f /mnt/testdir/file2 -b 8192 -c 20 -d 2
16:45:05.60 SUCCESS: zfs snapshot testpool/testfs@testsnap.2
16:45:05.61 SUCCESS: file_write -o create -f /mnt/testdir/file3 -b 8192 -c 20 -d 3
16:45:05.64 SUCCESS: zfs snapshot testpool/testfs@testsnap.3
16:45:05.64 SUCCESS: file_write -o create -f /mnt/testdir/file4 -b 8192 -c 20 -d 4
16:45:05.67 SUCCESS: zfs snapshot testpool/testfs@testsnap.4
16:45:05.67 SUCCESS: file_write -o create -f /mnt/testdir/file5 -b 8192 -c 20 -d 5
16:45:05.70 SUCCESS: zfs snapshot testpool/testfs@testsnap.5
16:45:05.70 SUCCESS: file_write -o create -f /mnt/testdir/file6 -b 8192 -c 20 -d 6
16:45:05.73 SUCCESS: zfs snapshot testpool/testfs@testsnap.6
16:45:05.74 SUCCESS: file_write -o create -f /mnt/testdir/file7 -b 8192 -c 20 -d 7
16:45:05.76 SUCCESS: zfs snapshot testpool/testfs@testsnap.7
16:45:05.77 SUCCESS: file_write -o create -f /mnt/testdir/file8 -b 8192 -c 20 -d 8
16:45:05.80 SUCCESS: zfs snapshot testpool/testfs@testsnap.8
16:45:05.80 SUCCESS: file_write -o create -f /mnt/testdir/file9 -b 8192 -c 20 -d 9
16:45:05.83 SUCCESS: zfs snapshot testpool/testfs@testsnap.9
16:45:05.86 SUCCESS: zfs destroy testpool/testfs@testsnap.1
16:45:05.89 SUCCESS: zfs destroy testpool/testfs@testsnap.2
16:45:05.92 SUCCESS: zfs destroy testpool/testfs@testsnap.3
16:45:05.95 SUCCESS: zfs destroy testpool/testfs@testsnap.4
16:45:05.98 SUCCESS: zfs destroy testpool/testfs@testsnap.5
16:45:06.02 SUCCESS: zfs destroy testpool/testfs@testsnap.6
16:45:06.05 SUCCESS: zfs destroy testpool/testfs@testsnap.7
16:45:06.08 SUCCESS: zfs destroy testpool/testfs@testsnap.8
16:45:06.11 SUCCESS: zfs destroy testpool/testfs@testsnap.9
16:45:06.15 SUCCESS: zpool sync testpool
16:45:06.15 After destroying snapshots, the space is returned to the pool.
16:45:06.15 NOTE: Performing local cleanup via log_onexit (cleanup)

And the relevant part in 7.4.console:

[13714.051945] WARNING: Unable to automount /mnt/testdir/.zfs/snapshot/testsnap.8/testpool/testfs@testsnap.8: 256
[13714.147827] WARNING: Unable to automount /mnt/testdir/.zfs/snapshot/testsnap.9/testpool/testfs@testsnap.9: 256
[13714.520884] WARNING: Unable to automount /mnt/testdir1/.zfs/snapshot/testsnap/testpool/testctr/testfs1@testsnap: 256
[13715.044997] WARNING: Unable to automount /mnt/testdir1/.zfs/snapshot/testsnap/testpool/testctr/testfs1@testsnap: 256
[13717.056079] VERIFY(txg_list_empty(&dp->dp_dirty_dirs, txg)) failed
[13717.061400] PANIC at spa.c:7869:spa_sync()
[13717.064799] Showing stack for process 10853
[13717.064801] CPU: 1 PID: 10853 Comm: txg_sync Tainted: P OE 4.15.0-1007-aws #7-Ubuntu
[13717.064802] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[13717.064803] Call Trace:
[13717.064812] dump_stack+0x63/0x8b
[13717.064824] spl_dumpstack+0x29/0x30 [spl]
[13717.064830] spl_panic+0xc8/0x110 [spl]
[13717.064835] ? _wake_up_common_lock+0x8e/0xc0
[13717.064928] ? queued_spin_unlock+0xb/0x10 [zfs]
[13717.064997] ? queued_spin_unlock+0xb/0x10 [zfs]
[13717.065064] spa_sync+0x1042/0x10b0 [zfs]
[13717.065136] txg_sync_thread+0x2ef/0x4a0 [zfs]
[13717.065205] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[13717.065214] thread_generic_wrapper+0x7d/0xc0 [spl]
[13717.065218] kthread+0x121/0x140
[13717.065225] ? IS_ERR+0x10/0x10 [spl]
[13717.065227] ? kthread_create_worker_on_cpu+0x70/0x70
[13717.065231] ? do_syscall_64+0x73/0x130
[13717.065234] ? SyS_exit_group+0x14/0x20
[13717.065237] ret_from_fork+0x35/0x40
[13897.124668] INFO: task kworker/u30:0:32082 blocked for more than 120 seconds.
[13897.129283] Tainted: P OE 4.15.0-1007-aws #7-Ubuntu
[13897.134130] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13897.139437] kworker/u30:0 D 0 32082 2 0x80000000
[13897.139446] Workqueue: events_freezable_power
disk_events_workfn

Looks like zfs destroyin cleanup triggered the panic. But I haven't learned enough of zfs to figure out what is the cause.

@behlendorf can you take a look when you got time?

module/zfs/txg.c Show resolved Hide resolved
module/zfs/txg.c Outdated Show resolved Hide resolved
@behlendorf
Copy link
Contributor

@seekfirstleapsecond thanks for opening the PR and looking in to the failures. Don't worry about the kernel.org failures, they appear to be due to recent changes to an unreleased kernel and will need to be investigated independently. As for that last Ubuntu 18.04 x86_64 (TEST) failure, it's a known issue which is fairly hard to reproduce and we occasionally see when running the ZTS. It needs to be investigated and resolved.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 8, 2019
Signed-off-by: Leap Second <leapsecond@protonmail.com>
Signed-off-by: Leap Second <leapsecond@protonmail.com>
@behlendorf behlendorf requested a review from ahrens January 8, 2019 17:31
module/zfs/txg.c Show resolved Hide resolved
@c0d3z3r0
Copy link
Contributor

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants