Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overhead in zvol_write/zvol_read via existing hold #4316

Closed
wants to merge 2 commits into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Feb 6, 2016

Our zvol_write function had severely diverged with illumos-gate in a
manner that caused multiple regressions:

  1. We did dnode_hold() on each IO when we already had a hold.
  2. We would attempt to send writes that exceeded DMU_MAX_ACCESS to the
    DMU.
  3. We could call zil_commit() twice. In this case, this is because
    Linux uses the ->write function to send flushes and can aggregate the
    flush with a write. If a synchronous write occurred with the flush, we
    effectively flushed twice when there is no need to do that.

Signed-off-by: Richard Yao ryao@gentoo.org

@ryao
Copy link
Contributor Author

ryao commented Feb 6, 2016

I probably should elaborate on FLUSH+FUA semantics. I suspect the correct interpretation is that FLUSH applies to data that preceded this IO while FUA applies to the IO itself. Since the ZIL records are a linked list, we have an ordering such that we implicitly protect data that preceeded this IO on power failure without allowing this IO to appear. If the transaction group closes while we are still writing a large write, everything prior would be committed to disk while the new write would be partially written, provided that it is being broken into multiple transactions. Since we do not guarantee anything made it to disk at all until we return, we only need to call zil_commit once.

@ryao
Copy link
Contributor Author

ryao commented Feb 6, 2016

I noticed that zvol_read() suffers from the same problems, so I pushed a couple of commits to fix it. There are two commits rather than one because illumos-gate has the same bug where it uses dmu_read_uio() instead of dmu_read_uio_dbuf() too.


error = dmu_read_uio_dbuf(zv->zv_dbuf, ZVOL_OBJ, &uio, bytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need ZVOL_OBJ as per function:
dmu_read_uio_dbuf(dmu_buf_t *zdb, uio_t *uio, uint64_t size)

error = dmu_read_uio_dbuf(zv->zv_dbuf, &uio, bytes);
That should be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I corrected that typo and repushed. :)

@ryao ryao changed the title Resynchronize zvol_write with illumos-gate Reduce overhead in zvol_write/zvol_read via existing hold Feb 6, 2016
@ryao
Copy link
Contributor Author

ryao commented Feb 8, 2016

@behlendorf I would like to squash this PR into 1 or 2 patches and rewrite the commit message(s) before any of this is merged. I say 1 or 2 because I am undecided on whether I should break out the Call dmu_read_uio_dbuf() in zvol_read() patch that is applicable to illumos-gate.

@ryao ryao force-pushed the zvol branch 2 times, most recently from 7be5e80 to 4f47945 Compare February 13, 2016 04:23
@behlendorf
Copy link
Contributor

I doubt this will make any significant impact to performance and the bugs addressed aren't harmful but it's a useful bit of cleanup. It nicely builds on the work done by @tuxoko which allowed bio_vec's to be safely passed in a uio_t. @tuxoko this LGTM to be but I'd appreciate it if you could please review this too.

@behlendorf
Copy link
Contributor

@ryao let's keep the patches separate as you've done.

@ryao
Copy link
Contributor Author

ryao commented Feb 17, 2016

@behlendorf At this point, I think further improvements in performance will involve micro-optimizations of this nature. There really is no obvious bottleneck for latency or source of CPU usage, although it would be nice to make further reductions in both categories.

@dweeezil
Copy link
Contributor

Since I was using this patch while working on #4265, I figured I'd post here that this patch worked perfectly fine for me and seems to be a nice bit of cleanup. I never did measure its performance impact, but as has been pointed out, it's likely to be fairly small.

In illumos-gate, `zvol_read` and `zvol_write` are both passed uio_t
rather than bio_t. Since we are translating from bio to uio for both, we
might as well unify the logic and have code more similar to its illumos
counterpart. At the same time, we can fix some regressions that occurred
versus the original code from illumos-gate.

We refactor zvol_write to take uio and also correct the
following problems:

1. We did `dnode_hold()` on each IO when we already had a hold.
2. We would attempt to send writes that exceeded `DMU_MAX_ACCESS` to the
DMU.
3. We could call `zil_commit()` twice. In this case, this is because
Linux uses the `->write` function to send flushes and can aggregate the
flush with a write. If a synchronous write occurred with the flush, we
effectively flushed twice when there is no need to do that.

zvol_read also suffers from the first two problems. Other platforms
suffer from the first, so we leave that for a second patch so that there
is a discrete patch for them to cherry-pick.

Signed-off-by: Richard Yao <ryao@gentoo.org>
The difference between `dmu_read_uio()` and `dmu_read_uio_dbuf()` is
that the former takes a hold while the latter uses an existing hold.
`zfs_read()` in the ZPL will use `dmu_read_uio_dbuf()` while
our analogous `zvol_write()` will use `dmu_write_uio_dbuf()`, but for no
apparent reason, we inherited a `zvol_read()` function from
OpenSolaris that does `dmu_read_uio()`. illumos-gate also still
uses `dmu_read_uio()` to this day. Lets switch to `dmu_read_uio_dbuf()`,
which is more performant.

Signed-off-by: Richard Yao <ryao@gentoo.org>
@ryao
Copy link
Contributor Author

ryao commented Feb 17, 2016

@tuxoko I have updated this to address your comments on the !! operator that you made on b7a908a.

As I said on the comment, your suggestion on breaking at a block boundary is an improvement that I would prefer to leave for another patch because the first one is partly intended to allow improvements that apply to both ZoL and illumos-gate to be easily cherry-picked and your suggestion applies to code specific to illumos-gate. Also, I am not sure how much that matters in practice because we are breaking up IO on a 32MB boundary and IOs of such size are very rare. It is probably an improvement that we can do separately when one of us has some time to spend working it out. We also do not need to break on the block boundary as we only need to break on the record boundary. After doing that, 32MB chunks will fill the buffers every time without any need to perform reads.

@tuxoko
Copy link
Contributor

tuxoko commented Feb 17, 2016

@ryao
Right, 32MB should be very unlikely.

@tuxoko
Copy link
Contributor

tuxoko commented Feb 17, 2016

@behlendorf
This LGTM.

behlendorf pushed a commit that referenced this pull request Feb 18, 2016
The difference between `dmu_read_uio()` and `dmu_read_uio_dbuf()` is
that the former takes a hold while the latter uses an existing hold.
`zfs_read()` in the ZPL will use `dmu_read_uio_dbuf()` while
our analogous `zvol_write()` will use `dmu_write_uio_dbuf()`, but for no
apparent reason, we inherited a `zvol_read()` function from
OpenSolaris that does `dmu_read_uio()`. illumos-gate also still
uses `dmu_read_uio()` to this day. Lets switch to `dmu_read_uio_dbuf()`,
which is more performant.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #4316
@behlendorf
Copy link
Contributor

Merged as:

19a47cb Call dmu_read_uio_dbuf() in zvol_read()
a765a34 Clean up zvol request processing to pass uio and fix porting regressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants