-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rate at which new zvols are processed #8615
Conversation
4f8c9c9
to
939345a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a nice improvement!
One interesting thing I noticed that the CI caught was that apparently we somehow issued a flush to the block device when ZVOL_RDONLY
wasn't set, but neither was ZVOL_OPENED_WR
. This resulted in the following stack which caused several of the bots to fail.
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
task: ffff9b3752d18000 task.stack: ffffbb1c01154000
RIP: 0010:zil_commit+0x5/0x50 [zfs]
RSP: 0018:ffffbb1c01157db0 EFLAGS: 00010206
RAX: 0000000000040801 RBX: ffff9b37d1b99a00 RCX:
zvol_request+0x163/0x300 [zfs]
generic_make_request+0x123/0x300
submit_bio+0x6c/0x140
submit_bio_wait+0x57/0x80
blkdev_issue_flush+0x7c/0xb0
blkdev_fsync+0x2f/0x40
do_fsync+0x38/0x60
SyS_fsync+0xc/0x10
module/zfs/zvol.c
Outdated
* need to open a ZIL. | ||
*/ | ||
zv->zv_flags |= ZVOL_OPENED_WR; | ||
zv->zv_zilog = zil_open(zv->zv_objset, zvol_get_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to add an assertion to zvol_write()
and zvol_discard()
that ZVOL_OPENED_WR
is set (or that zv->zv_zilog != NULL
) since they may call zil_commit()
.
Thanks for pointing that out. I'm not sure why running the zfs-tests on my machine didn't hit this issue, but I can reproduce it easily manually by opening the zvol read-only and then calling |
939345a
to
251daa3
Compare
Thanks for addressing the It looks like the call path was |
251daa3
to
dd2c77d
Compare
It turns out that a device can be written to even if it has never been opened with For example, in the following scenario the partition is opened twice, first read-only and then write-only. The zvol
The logic that controls this behavior is in Given that this is the way the kernel behaves, I've made the call to |
Codecov Report
@@ Coverage Diff @@
## master #8615 +/- ##
==========================================
- Coverage 79.34% 78.75% -0.6%
==========================================
Files 262 381 +119
Lines 77786 117594 +39808
==========================================
+ Hits 61723 92606 +30883
- Misses 16063 24988 +8925
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the call to zil_open a bit lazier. Now it gets called in zvol_request the first time we get asked to do a write for a particular volume.
This may be lazier but I think the resulting patch is much better and easier to reason about. My only concern is the introduction of taking the zv->zv_state_lock
mutex in the main, relatively hot, zvol_request()
call path.
What if instead we protected zv->zv_zilog
with the existing zv->zv_suspend_lock
lock. It's already taken as a reader in zvol_request()
so it'd be easy to add the zv->zv_zilog == NULL
check, and only promote it to a writer for the zil_open()
. zvol_setup_zv
and zvol_shutdown_zv
already assert that it's held. What do you think?
The kernel function which adds new zvols as disks to the system, add_disk(), briefly opens and closes the zvol as part of its work. Closing a zvol involves waiting for two txgs to sync. This, combined with the fact that the taskq processing new zvols is single threaded, makes this processing new zvols slow. Waiting for these txgs to sync is only neccessary if the zvol has been written to, which is not the case during add_disk(). This change adds tracking of whether a zvol has been written to so that we can skip the txg_wait_synced() calls when they are unecessary. This change also fixes the flags passed to blkdev_get_by_path() by vdev_disk_open() to be FMODE_READ | FMODE_WRITE | FMODE_EXCL instead of just FMODE_EXCL. The flags were being incorrectly calculated because we were using the wrong version of vdev_bdev_mode(). Signed-off-by: John Gallagher <john.gallagher@delphix.com>
dd2c77d
to
f6c0ad0
Compare
@behlendorf That makes sense to me. I've updated the PR with your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Does this alternate version of the PR result in a similar speed up to the original? (I'd expect it to). If we can get a second reviewers let's see about getting this pretty small change in before 0.8.
Yes, it does. |
The kernel function which adds new zvols as disks to the system, add_disk(), briefly opens and closes the zvol as part of its work. Closing a zvol involves waiting for two txgs to sync. This, combined with the fact that the taskq processing new zvols is single threaded, makes this processing new zvols slow. Waiting for these txgs to sync is only necessary if the zvol has been written to, which is not the case during add_disk(). This change adds tracking of whether a zvol has been written to so that we can skip the txg_wait_synced() calls when they are unnecessary. This change also fixes the flags passed to blkdev_get_by_path() by vdev_disk_open() to be FMODE_READ | FMODE_WRITE | FMODE_EXCL instead of just FMODE_EXCL. The flags were being incorrectly calculated because we were using the wrong version of vdev_bdev_mode(). Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes openzfs#8526 Closes openzfs#8615
The kernel function which adds new zvols as disks to the system, add_disk(), briefly opens and closes the zvol as part of its work. Closing a zvol involves waiting for two txgs to sync. This, combined with the fact that the taskq processing new zvols is single threaded, makes this processing new zvols slow. Waiting for these txgs to sync is only necessary if the zvol has been written to, which is not the case during add_disk(). This change adds tracking of whether a zvol has been written to so that we can skip the txg_wait_synced() calls when they are unnecessary. This change also fixes the flags passed to blkdev_get_by_path() by vdev_disk_open() to be FMODE_READ | FMODE_WRITE | FMODE_EXCL instead of just FMODE_EXCL. The flags were being incorrectly calculated because we were using the wrong version of vdev_bdev_mode(). Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes openzfs#8526 Closes openzfs#8615
Motivation and Context
The kernel function which adds new zvols as disks to the system,
add_disk()
, briefly opens and closes the zvol as part of its work. Closing a zvol involves waiting for two txgs to sync. This, combined with the fact that the taskq processing new zvols is single threaded, can make this processing of new zvols quite slow.Issue #8526
Description
Waiting for these txgs to sync is only necessary if the zvol has been written to, which is not the case during
add_disk()
. This change adds tracking of whether a zvol has been opened in write mode so that we can skip thetxg_wait_synced()
calls when they are unnecessary.One of the
txg_wait_synced()
calls happens inzil_close()
. To prevent this wait, this change avoids opening a ZIL for the zvol until the zvol is opened in write mode, so that the call tozil_close()
can be skipped entirely when the zvol is closed. It might also be possible to preventzil_close()
from callingtxg_wait_synced()
if no data has been written to the ZIL. However,zil_close()
callszil_commit()
which dirties the ZIL even if nothing has been written to it yet. Not being too familiar with how the ZIL works, I wasn't sure how best to prevent that, so I opted to avoid opening a ZIL until it was needed.This change also fixes the flags passed to
blkdev_get_by_path()
byvdev_disk_open()
to beFMODE_READ | FMODE_WRITE | FMODE_EXCL
instead of justFMODE_EXCL
. The flags were being incorrectly calculated because we were using the wrong version ofvdev_bdev_mode()
. Without this change tests which create a zpool on top of a zvol would cause crashes because we would write to the zvol even though we hadn't passed in theFMODE_WRITE
flag when we opened it.How Has This Been Tested?
Tested by creating a large number of zvols in succession in a pool with a moderate write load and then waiting for all of the links to appear in
/dev/zvol
:This test ran ~500x faster with this change than it did without (5sec vs 41min).
Tested that writes to a zvol are still handled correctly when the zvol is opened multiple times concurrently in different modes, using variations of
Also used bpftrace to confirm that the correct flags (
FMODE_READ | FMODE_WRITE | FMODE_EXCL
) are being passed intoblkdev_get_by_path()
Types of changes
Checklist:
Signed-off-by
.