-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to discard invalid seeds #203
Conversation
While thinking more about that I think that having an option to switch between "try to continue without the seed" and "bail out" might be useful, and it also gives us the ability to expand it later to do even more clever things instead of just entirely discarding the seed (e.g. recreate it). |
18246ba
to
e11a329
Compare
e11a329
to
fa206bd
Compare
Now by default I left the current behavior of failing on invalid seeds and added a new option I created a type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I'll need a bit more time to review this since it's in a rather complex part of the code base, but I should be able to get to it later this week
fa206bd
to
c1b8a1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Please take a look at the performance issue and let me know what you think.
_, err := cmd.ExecuteC() | ||
require.Error(t, err) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently depends on desync detecting any errors in the assembly step. Given the complexities of dealing with corrupted seeds, do you think it'd make sense to validate the output file? Like perhaps read it back from disk and calculating the SHA256 of it, then comparing to the expected value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, something similar to the validate()
that it's currently in fileseed.go
, but doing it against the output file.
I guess I can look into it. Would you prefer it to be in this PR or in a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected hash for the whole file is it already available in the index file or should we calculate that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that you were commenting on extract_test
. When you talked about comparing the final hash, were you referring only to the tests cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that can also wait for another PR. I just think given how easy it is to get the seed-algorithms wrong, it'd be good to check that the whole file is assembled correctly in tests. It'd be a little tricky to come up with good test seeds though, since it should cover file-seeds, self-seeds, null-seed etc in specific combinations to exercise all paths properly. On the other hand, having a predicatable seed plan now (it was non-deterministic before) should help.
aca9b26
to
3197da5
Compare
_, err := cmd.ExecuteC() | ||
require.Error(t, err) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose
Could you take a look at the two extra fields that were added to IndexChunk? I suspect they're leftover from an earlier implementation and can be removed. Other than that this looks good. Thanks for implementing this. It opens up more possibilities as well |
7d32123
to
1ca0417
Compare
1ca0417
to
3f9c4ef
Compare
3f9c4ef
to
fde373f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. Did you check performance? I suspect it's about equivalent to before
fde373f
to
c48c414
Compare
I ran a couple of times an extraction and this patch was in average 6 seconds slower (4m30s vs 4m24s) with a cold cache, and about the same speed with a hot cache. Apparently the downside is that now the progress bar doesn't report any progress for the first few seconds, while it generates the plan. IMHO that's fine, but if you'd like to show some sort of progress we can figure something out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Please take a look at my last comment and then I think this is good to merge in.
As for the progressbar, I agree that it's not critical to have right now. It'd be quite simple to wire up and introduce a whole step in it. If you look at desync make
for example, you can see there's multiple stages in the progressbar. So you could make "Planning" and "Validating" stages themselves in the progressbar and calculate it based on the # of chunks to get through. Planning would be very quick, I suspect the 3s you saw was spent in the validation step.
12f18b6
to
9bd77d9
Compare
If we have a seed index for a file/device that is corrupted we may want to try harder and continue anyway by discarding the invalid seed and fallback to the potentially other seeds and/or by just using the store. This has been added with a new option "--skip-invalid-seeds". In order to not reduce the installation speed, the process of choosing and validating the chunks has been refactored, introducing a new concept called "plan". We create a plan about where to look for the chunks ahead of time, before starting to actually write in the destination. Signed-off-by: Ludovico de Nittis <ludovico.denittis@collabora.com>
9bd77d9
to
9545c56
Compare
Thank you for working through all of this |
If we have a seed index for a file/device that is corrupted we should
try harder to continue anyway by discarding the invalid seed and
fallback to the potentially other seeds and/or by just using the store.
Partially addresses #117
At the beginning I tried to just mark the unexpected chunk as invalid, but realigning the "good" chunks to compensate the corrupted values required too much work and was probably not worth it.
Please let me know if you'd prefer to have a launch option to control this behavior or if it's okay to just always do it.