Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to discard invalid seeds #203

Merged
merged 1 commit into from
Dec 30, 2021

Conversation

RyuzakiKK
Copy link
Contributor

If we have a seed index for a file/device that is corrupted we should
try harder to continue anyway by discarding the invalid seed and
fallback to the potentially other seeds and/or by just using the store.

Partially addresses #117


At the beginning I tried to just mark the unexpected chunk as invalid, but realigning the "good" chunks to compensate the corrupted values required too much work and was probably not worth it.

Please let me know if you'd prefer to have a launch option to control this behavior or if it's okay to just always do it.

@RyuzakiKK
Copy link
Contributor Author

RyuzakiKK commented Dec 8, 2021

While thinking more about that I think that having an option to switch between "try to continue without the seed" and "bail out" might be useful, and it also gives us the ability to expand it later to do even more clever things instead of just entirely discarding the seed (e.g. recreate it).

@RyuzakiKK RyuzakiKK marked this pull request as draft December 8, 2021 09:12
@RyuzakiKK RyuzakiKK changed the title Discard seed if it has invalid chunks Add option to discard invalid seeds Dec 8, 2021
@RyuzakiKK
Copy link
Contributor Author

Now by default I left the current behavior of failing on invalid seeds and added a new option --skip-invalid-seeds to discard them and continue.

I created a type InvalidSeedAction instead of a single boolean to make it easier in the future to add additional actions that we might want to add.

@RyuzakiKK RyuzakiKK marked this pull request as ready for review December 8, 2021 16:49
Copy link
Owner

@folbricht folbricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'll need a bit more time to review this since it's in a rather complex part of the code base, but I should be able to get to it later this week

assemble.go Outdated Show resolved Hide resolved
Copy link
Owner

@folbricht folbricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Please take a look at the performance issue and let me know what you think.

_, err := cmd.ExecuteC()
require.Error(t, err)
})
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently depends on desync detecting any errors in the assembly step. Given the complexities of dealing with corrupted seeds, do you think it'd make sense to validate the output file? Like perhaps read it back from disk and calculating the SHA256 of it, then comparing to the expected value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, something similar to the validate() that it's currently in fileseed.go, but doing it against the output file.

I guess I can look into it. Would you prefer it to be in this PR or in a followup?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected hash for the whole file is it already available in the index file or should we calculate that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that you were commenting on extract_test. When you talked about comparing the final hash, were you referring only to the tests cases?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that can also wait for another PR. I just think given how easy it is to get the seed-algorithms wrong, it'd be good to check that the whole file is assembled correctly in tests. It'd be a little tricky to come up with good test seeds though, since it should cover file-seeds, self-seeds, null-seed etc in specific combinations to exercise all paths properly. On the other hand, having a predicatable seed plan now (it was non-deterministic before) should help.

assemble.go Outdated Show resolved Hide resolved
fileseed.go Outdated Show resolved Hide resolved
fileseed.go Outdated Show resolved Hide resolved
sequencer.go Outdated Show resolved Hide resolved
@RyuzakiKK RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from aca9b26 to 3197da5 Compare December 14, 2021 16:28
_, err := cmd.ExecuteC()
require.Error(t, err)
})
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose

index.go Outdated Show resolved Hide resolved
sequencer.go Outdated Show resolved Hide resolved
sequencer.go Outdated Show resolved Hide resolved
@folbricht
Copy link
Owner

Could you take a look at the two extra fields that were added to IndexChunk? I suspect they're leftover from an earlier implementation and can be removed. Other than that this looks good. Thanks for implementing this. It opens up more possibilities as well

@RyuzakiKK RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from 7d32123 to 1ca0417 Compare December 15, 2021 14:38
sequencer.go Outdated Show resolved Hide resolved
sequencer.go Show resolved Hide resolved
fileseed.go Show resolved Hide resolved
assemble.go Outdated Show resolved Hide resolved
assemble.go Outdated Show resolved Hide resolved
assemble.go Outdated Show resolved Hide resolved
Copy link
Owner

@folbricht folbricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Did you check performance? I suspect it's about equivalent to before

sequencer.go Outdated Show resolved Hide resolved
@RyuzakiKK
Copy link
Contributor Author

Looking pretty good. Did you check performance? I suspect it's about equivalent to before

I ran a couple of times an extraction and this patch was in average 6 seconds slower (4m30s vs 4m24s) with a cold cache, and about the same speed with a hot cache.
In my test I used a partition with a size of 5gb and the plan was generated in ~7 seconds (with the cache dropped echo 3 | sudo tee /proc/sys/vm/drop_caches). With a hot cache it takes less than a second.

Apparently the downside is that now the progress bar doesn't report any progress for the first few seconds, while it generates the plan.
Unfortunately I don't think it's possible to do a realistic estimation regarding how much this step is gonna take compared to the whole update.

IMHO that's fine, but if you'd like to show some sort of progress we can figure something out.

Copy link
Owner

@folbricht folbricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Please take a look at my last comment and then I think this is good to merge in.

As for the progressbar, I agree that it's not critical to have right now. It'd be quite simple to wire up and introduce a whole step in it. If you look at desync make for example, you can see there's multiple stages in the progressbar. So you could make "Planning" and "Validating" stages themselves in the progressbar and calculate it based on the # of chunks to get through. Planning would be very quick, I suspect the 3s you saw was spent in the validation step.

assemble.go Outdated Show resolved Hide resolved
@RyuzakiKK RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from 12f18b6 to 9bd77d9 Compare December 28, 2021 11:42
assemble.go Outdated Show resolved Hide resolved
If we have a seed index for a file/device that is corrupted we may want
to try harder and continue anyway by discarding the invalid seed and
fallback to the potentially other seeds and/or by just using the store.

This has been added with a new option "--skip-invalid-seeds".

In order to not reduce the installation speed, the process of choosing
and validating the chunks has been refactored, introducing a new concept
called "plan".
We create a plan about where to look for the chunks ahead of time,
before starting to actually write in the destination.

Signed-off-by: Ludovico de Nittis <ludovico.denittis@collabora.com>
@folbricht folbricht merged commit a4c6fd2 into folbricht:master Dec 30, 2021
@folbricht
Copy link
Owner

Thank you for working through all of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants