New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dmu buf hold array opt zio #9836
Conversation
Enable picky cstyle checks and resolve the new warnings. The vast
majority of the changes needed were to handle minor issues with
whitespace formatting. This patch contains no functional changes.
Non-whitespace changes are as follows:
* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. When possible being able to skip creating and waiting on that root zio looks like it's a nice performance win. A few comments inline, but the intent of this makes good sense to me.
Would you mind rebasing this on the latest master branch before force updating the PR.
| @@ -474,20 +474,23 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length, | |||
| } | |||
| dbp = kmem_zalloc(sizeof (dmu_buf_t *) * nblks, KM_SLEEP); | |||
|
|
|||
| zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, ZIO_FLAG_CANFAIL); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to read this code if we always created the zio for when read is set. We should get all the benefit of your change in the common case. And only in the rare error case where the dbuf_hold() fails and we return EIO would be incur the allocation/nowait cost. Then the changes on lines 483 and 489 wouldn't be needed.
- zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, ZIO_FLAG_CANFAIL);
+ if (read) {
+ zio = zio_root(dn->dn_objset->os_spa, NULL, NULL,
+ ZIO_FLAG_CANFAIL);
+ }
+| if (err) { | ||
| dmu_buf_rele_array(dbp, nblks, tag); | ||
| return (err); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this whole block could be pulled under the existing read conditional below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf thanks for the review, please have a look at slighly updated version - basically it helps reads too. this doesn't address your concerns yet, but I'm fine to fix those of course, if the approach taken in the patch makes some sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The write side looks good. But I'm fairly certain we still need to call dbuf_read even when the db_state == DB_CACHED since it's possible the cached dbuf still needs to be decompressed or decrypted. Since you saw a significant performance win in your testing by skipping DB_CACHED dbufs it'd be interesting to know which bit of the db->db_state == DB_CACHED case in dbuf_read() was slowing things down. Perhaps that can be optimized.
this improves one Lustre test from 90s to 81s (sanity/60a): average read (from cache) went from 7.7 usec to 5.6 usec average write went from 2.7 usec to 1.7 usec all reads and writes are small (<=8K) in this test. Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
09d24fc
to
0705862
Compare
|
Closing. A version of this optimization was applied in commit ec50cd2. |
dmu_buf_hold_array_by_dnode() can create zio on demand saving on few allocations.
Motivation and Context
Description
How Has This Been Tested?
tested with Lustre using number of different tests (including well-known like dbench, iozone)
mostly in KVM
Lustre's sanity/60a takes 81s intead of 90s w/o the patch.
Types of changes
Checklist:
Signed-off-by.