Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dmu buf hold array opt zio #9836

Closed
wants to merge 2 commits into from

Conversation

bzzz77
Copy link
Contributor

@bzzz77 bzzz77 commented Jan 13, 2020

dmu_buf_hold_array_by_dnode() can create zio on demand saving on few allocations.

Motivation and Context

Description

How Has This Been Tested?

tested with Lustre using number of different tests (including well-known like dbench, iozone)

mostly in KVM

Lustre's sanity/60a takes 81s intead of 90s w/o the patch.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • [X ] Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

Enable picky cstyle checks and resolve the new warnings.  The vast
majority of the changes needed were to handle minor issues with
whitespace formatting.  This patch contains no functional changes.

Non-whitespace changes are as follows:

* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5465
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 13, 2020
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. When possible being able to skip creating and waiting on that root zio looks like it's a nice performance win. A few comments inline, but the intent of this makes good sense to me.

Would you mind rebasing this on the latest master branch before force updating the PR.

@@ -474,20 +474,23 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length,
}
dbp = kmem_zalloc(sizeof (dmu_buf_t *) * nblks, KM_SLEEP);

zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, ZIO_FLAG_CANFAIL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to read this code if we always created the zio for when read is set. We should get all the benefit of your change in the common case. And only in the rare error case where the dbuf_hold() fails and we return EIO would be incur the allocation/nowait cost. Then the changes on lines 483 and 489 wouldn't be needed.

-       zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, ZIO_FLAG_CANFAIL);
+       if (read) {
+               zio = zio_root(dn->dn_objset->os_spa, NULL, NULL,
+                   ZIO_FLAG_CANFAIL);
+       }
+

if (err) {
dmu_buf_rele_array(dbp, nblks, tag);
return (err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this whole block could be pulled under the existing read conditional below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf thanks for the review, please have a look at slighly updated version - basically it helps reads too. this doesn't address your concerns yet, but I'm fine to fix those of course, if the approach taken in the patch makes some sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write side looks good. But I'm fairly certain we still need to call dbuf_read even when the db_state == DB_CACHED since it's possible the cached dbuf still needs to be decompressed or decrypted. Since you saw a significant performance win in your testing by skipping DB_CACHED dbufs it'd be interesting to know which bit of the db->db_state == DB_CACHED case in dbuf_read() was slowing things down. Perhaps that can be optimized.

this improves one Lustre test from 90s to 81s (sanity/60a):
average read (from cache) went from 7.7 usec to 5.6 usec
average write went from 2.7 usec to 1.7 usec
all reads and writes are small (<=8K) in this test.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
@behlendorf behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Code Review Needed Ready for review and testing labels Feb 4, 2020
@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Aug 24, 2020
@behlendorf
Copy link
Contributor

Closing. A version of this optimization was applied in commit ec50cd2.

@behlendorf behlendorf closed this Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Revision Needed Changes are required for the PR to be accepted Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants