9485 Optimize possible split block search space #625
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for porting this. I only have one question/suggestion but other than that it looks good to me.
is != NULL; is = list_next(&iv->iv_splits, is)) { | ||
uint64_t is_copies = 0; | ||
|
||
for (int i = 0; i < is->is_children; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question which may lead a small optimization although I may be completely wrong.
In this loop we are going through each is_child:
If it has no data we skip it. If it has data we are looking at the rest of the is_child to see if any of them has the same data as our original is_child and mark them as duplicate, and finally we increment is_copies which we will use to get the number of combinations.
The above makes sense with the rest of the logic in this function but to me it sounds that what we call "accurate combinations" is not exactly accurate as the combinations themselves are not unique.
The variable is_copies
(that is later used to calculate combinations
) tells us how many of the is_children vdevs have their ic_data
field populated (e.g. not NULL). So that got me wondering, what if:
[1] we changed the name of is_copies
to something like is_split_versions
[2] in this loop instead of just skipping when ic_data == NULL
we also skip when ic_duplicate != -1
(meaning we visited this ic_child
before [in the inner loop] and we know that it has not a unique version of ic_data
)
[3] Later the outer loop (right after the inner loop is done) we increment is_split_versions
and we use it exactly like is_copies
.
This should give us the number of unique combinations. Now I know that this won't work exactly with the rest of the logic in this function but what if the struct indirect_split_t
had a field list_t is_unique_versions
(or some similar name) and every time is_split_versions
is incremented in this loop we add the current is_child
to this list.
Then later in this function we can use this new list without having to care, do checks, and skip duplicates because they are never visited. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdimitro thanks for taking the time to review this.
"accurate combinations" is not exactly accurate as the combinations themselves are not unique.
Good catch. This appears to be a issue which snuck in when the patch was reworked. The intent was to calculate unique copies and the original patch did this by immediately freeing any dulicate is->is_child[i].ic_data
copies and setting is->is_child[i].ic_data = NULL
.
Subsequently the mechanism was reworked and is->is_child[j].ic_duplicate
was added instead because vdev_indirect_repair()
relied on having non-NULL abd_t
's for repair. In the process it looks like the loop logic wasn't entirely updated.
What you're proposing sounds like a reasonable way to fix this. I'll see about implementing the proposed changes in ZoL for @ahrens to add to this PR for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf let me know when you have the ZoL PR for this, and I'll pull it in here.
I've updated the code to pull in openzfs/zfs@1258bd7. @sdimitro could you take another look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found one minor nit. Code LGTM.
Are there any plans of doing any extra testing for this in illumos?
|
||
/* | ||
* Enable to simulate damaged segments and validate reconstruction. This | ||
* is intentionally not exposed as a module parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have "module parameters" in illumos so maybe we should take out this sentence just to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
I wasn't planning any additional testing beyond ztest / ZTS. My understanding is that ztest exercises the reconstruction code pretty well, but it's true that we aren't verifying which cases do deterministic vs random reconstruction. |
For what it's worth, even after merging this change to ZoL we are still seeing rare |
df35e34
to
1cedcda
Compare
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Remove duplicate segment copies to minimize the possible search space for reconstruction. Once reduced an accurate assessment can be made regarding the difficulty in reconstructing the block. Also, ztest will now run zdb with zfs_reconstruct_indirect_combinations_max set to 1000000 in an attempt to avoid checksum errors. Closes #625
1cedcda
to
c13489a
Compare
Port of openzfs/zfs@4589f3a
Author: @behlendorf
Remove duplicate segment copies to minimize the possible search
space for reconstruction. Once reduced an accurate assessment can
be made regarding the difficulty in reconstructing the block.
Also, ztest will now run zdb with
zfs_reconstruct_indirect_combinations_max set to 1000000 in an attempt
to avoid checksum errors.
Reviewed-by: Matthew Ahrens mahrens@delphix.com
Reviewed-by: Tim Chase tim@chase2k.com @dweeezil
Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov