Skip to content
This repository has been archived by the owner on Nov 7, 2019. It is now read-only.

9485 Optimize possible split block search space #625

Closed
wants to merge 1 commit into from

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Apr 17, 2018

Port of openzfs/zfs@4589f3a
Author: @behlendorf

Remove duplicate segment copies to minimize the possible search
space for reconstruction. Once reduced an accurate assessment can
be made regarding the difficulty in reconstructing the block.

Also, ztest will now run zdb with
zfs_reconstruct_indirect_combinations_max set to 1000000 in an attempt
to avoid checksum errors.

Reviewed-by: Matthew Ahrens mahrens@delphix.com
Reviewed-by: Tim Chase tim@chase2k.com @dweeezil
Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov

@ahrens ahrens requested a review from shartse April 17, 2018 17:35
Copy link

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for porting this. I only have one question/suggestion but other than that it looks good to me.

is != NULL; is = list_next(&iv->iv_splits, is)) {
uint64_t is_copies = 0;

for (int i = 0; i < is->is_children; i++) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question which may lead a small optimization although I may be completely wrong.

In this loop we are going through each is_child:
If it has no data we skip it. If it has data we are looking at the rest of the is_child to see if any of them has the same data as our original is_child and mark them as duplicate, and finally we increment is_copies which we will use to get the number of combinations.

The above makes sense with the rest of the logic in this function but to me it sounds that what we call "accurate combinations" is not exactly accurate as the combinations themselves are not unique.

The variable is_copies (that is later used to calculate combinations) tells us how many of the is_children vdevs have their ic_data field populated (e.g. not NULL). So that got me wondering, what if:
[1] we changed the name of is_copies to something like is_split_versions
[2] in this loop instead of just skipping when ic_data == NULL we also skip when ic_duplicate != -1 (meaning we visited this ic_child before [in the inner loop] and we know that it has not a unique version of ic_data)
[3] Later the outer loop (right after the inner loop is done) we increment is_split_versions and we use it exactly like is_copies.

This should give us the number of unique combinations. Now I know that this won't work exactly with the rest of the logic in this function but what if the struct indirect_split_t had a field list_t is_unique_versions (or some similar name) and every time is_split_versions is incremented in this loop we add the current is_child to this list.

Then later in this function we can use this new list without having to care, do checks, and skip duplicates because they are never visited. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdimitro thanks for taking the time to review this.

"accurate combinations" is not exactly accurate as the combinations themselves are not unique.

Good catch. This appears to be a issue which snuck in when the patch was reworked. The intent was to calculate unique copies and the original patch did this by immediately freeing any dulicate is->is_child[i].ic_data copies and setting is->is_child[i].ic_data = NULL.

Subsequently the mechanism was reworked and is->is_child[j].ic_duplicate was added instead because vdev_indirect_repair() relied on having non-NULL abd_t's for repair. In the process it looks like the loop logic wasn't entirely updated.

What you're proposing sounds like a reasonable way to fix this. I'll see about implementing the proposed changes in ZoL for @ahrens to add to this PR for review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf let me know when you have the ZoL PR for this, and I'll pull it in here.

@ahrens
Copy link
Member Author

ahrens commented Oct 3, 2018

I've updated the code to pull in openzfs/zfs@1258bd7. @sdimitro could you take another look at this?

Copy link

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found one minor nit. Code LGTM.

Are there any plans of doing any extra testing for this in illumos?


/*
* Enable to simulate damaged segments and validate reconstruction. This
* is intentionally not exposed as a module parameter.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have "module parameters" in illumos so maybe we should take out this sentence just to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@ahrens
Copy link
Member Author

ahrens commented Oct 3, 2018

I wasn't planning any additional testing beyond ztest / ZTS. My understanding is that ztest exercises the reconstruction code pretty well, but it's true that we aren't verifying which cases do deterministic vs random reconstruction.

@behlendorf
Copy link

For what it's worth, even after merging this change to ZoL we are still seeing rare ztest failures where reconstruction is either impossible or possibly exceeds the allowed combinations. I don't see an issue with the reconstruction logic, it may still be possible that ztest can damage a pool beyond repair. For the curious, here are links to the ztest log and an ztest pool.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>

Remove duplicate segment copies to minimize the possible search
space for reconstruction.  Once reduced an accurate assessment can
be made regarding the difficulty in reconstructing the block.

Also, ztest will now run zdb with
zfs_reconstruct_indirect_combinations_max set to 1000000 in an attempt
to avoid checksum errors.

Closes #625
@prakashsurya
Copy link
Member

  • git-zfs-precommit on openzfs is here
  • git-zfs-precommit on illumos is here

@ahrens ahrens deleted the split-block branch June 27, 2020 03:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants