Add option to discard invalid seeds #203

RyuzakiKK · 2021-12-07T10:46:15Z

If we have a seed index for a file/device that is corrupted we should
try harder to continue anyway by discarding the invalid seed and
fallback to the potentially other seeds and/or by just using the store.

Partially addresses #117

At the beginning I tried to just mark the unexpected chunk as invalid, but realigning the "good" chunks to compensate the corrupted values required too much work and was probably not worth it.

Please let me know if you'd prefer to have a launch option to control this behavior or if it's okay to just always do it.

RyuzakiKK · 2021-12-08T09:11:59Z

While thinking more about that I think that having an option to switch between "try to continue without the seed" and "bail out" might be useful, and it also gives us the ability to expand it later to do even more clever things instead of just entirely discarding the seed (e.g. recreate it).

RyuzakiKK · 2021-12-08T16:49:08Z

Now by default I left the current behavior of failing on invalid seeds and added a new option --skip-invalid-seeds to discard them and continue.

I created a type InvalidSeedAction instead of a single boolean to make it easier in the future to add additional actions that we might want to add.

folbricht

Thank you. I'll need a bit more time to review this since it's in a rather complex part of the code base, but I should be able to get to it later this week

assemble.go

folbricht

Great work. Please take a look at the performance issue and let me know what you think.

folbricht · 2021-12-12T15:09:40Z

cmd/desync/extract_test.go

+			_, err := cmd.ExecuteC()
+			require.Error(t, err)
+		})
+	}


This currently depends on desync detecting any errors in the assembly step. Given the complexities of dealing with corrupted seeds, do you think it'd make sense to validate the output file? Like perhaps read it back from disk and calculating the SHA256 of it, then comparing to the expected value.

Sure, something similar to the validate() that it's currently in fileseed.go, but doing it against the output file.

I guess I can look into it. Would you prefer it to be in this PR or in a followup?

In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose

The expected hash for the whole file is it already available in the index file or should we calculate that as well?

I just noticed that you were commenting on extract_test. When you talked about comparing the final hash, were you referring only to the tests cases?

Yes, but that can also wait for another PR. I just think given how easy it is to get the seed-algorithms wrong, it'd be good to check that the whole file is assembled correctly in tests. It'd be a little tricky to come up with good test seeds though, since it should cover file-seeds, self-seeds, null-seed etc in specific combinations to exercise all paths properly. On the other hand, having a predicatable seed plan now (it was non-deterministic before) should help.

assemble.go

fileseed.go

sequencer.go

folbricht · 2021-12-15T05:02:46Z

cmd/desync/extract_test.go

+			_, err := cmd.ExecuteC()
+			require.Error(t, err)
+		})
+	}


In another PR is just fine. You could just hash the whole output file and compare it against a fixed value I suppose

index.go

sequencer.go

folbricht · 2021-12-15T05:23:14Z

Could you take a look at the two extra fields that were added to IndexChunk? I suspect they're leftover from an earlier implementation and can be removed. Other than that this looks good. Thanks for implementing this. It opens up more possibilities as well

sequencer.go

fileseed.go

assemble.go

folbricht

Looking pretty good. Did you check performance? I suspect it's about equivalent to before

sequencer.go

RyuzakiKK · 2021-12-21T12:27:16Z

Looking pretty good. Did you check performance? I suspect it's about equivalent to before

I ran a couple of times an extraction and this patch was in average 6 seconds slower (4m30s vs 4m24s) with a cold cache, and about the same speed with a hot cache.
In my test I used a partition with a size of 5gb and the plan was generated in ~7 seconds (with the cache dropped echo 3 | sudo tee /proc/sys/vm/drop_caches). With a hot cache it takes less than a second.

Apparently the downside is that now the progress bar doesn't report any progress for the first few seconds, while it generates the plan.
Unfortunately I don't think it's possible to do a realistic estimation regarding how much this step is gonna take compared to the whole update.

IMHO that's fine, but if you'd like to show some sort of progress we can figure something out.

folbricht

Great work. Please take a look at my last comment and then I think this is good to merge in.

As for the progressbar, I agree that it's not critical to have right now. It'd be quite simple to wire up and introduce a whole step in it. If you look at desync make for example, you can see there's multiple stages in the progressbar. So you could make "Planning" and "Validating" stages themselves in the progressbar and calculate it based on the # of chunks to get through. Planning would be very quick, I suspect the 3s you saw was spent in the validation step.

assemble.go

If we have a seed index for a file/device that is corrupted we may want to try harder and continue anyway by discarding the invalid seed and fallback to the potentially other seeds and/or by just using the store. This has been added with a new option "--skip-invalid-seeds". In order to not reduce the installation speed, the process of choosing and validating the chunks has been refactored, introducing a new concept called "plan". We create a plan about where to look for the chunks ahead of time, before starting to actually write in the destination. Signed-off-by: Ludovico de Nittis <ludovico.denittis@collabora.com>

folbricht · 2021-12-30T15:22:17Z

Thank you for working through all of this

RyuzakiKK marked this pull request as draft December 8, 2021 09:12

RyuzakiKK force-pushed the invalid_chunks branch from 18246ba to e11a329 Compare December 8, 2021 16:39

RyuzakiKK changed the title ~~Discard seed if it has invalid chunks~~ Add option to discard invalid seeds Dec 8, 2021

RyuzakiKK force-pushed the invalid_chunks branch from e11a329 to fa206bd Compare December 8, 2021 16:45

RyuzakiKK marked this pull request as ready for review December 8, 2021 16:49

folbricht reviewed Dec 9, 2021

View reviewed changes

assemble.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch from fa206bd to c1b8a1f Compare December 9, 2021 14:09

folbricht reviewed Dec 12, 2021

View reviewed changes

RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from aca9b26 to 3197da5 Compare December 14, 2021 16:28

folbricht reviewed Dec 15, 2021

View reviewed changes

RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from 7d32123 to 1ca0417 Compare December 15, 2021 14:38

folbricht reviewed Dec 15, 2021

View reviewed changes

sequencer.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch from 1ca0417 to 3f9c4ef Compare December 17, 2021 17:00

folbricht reviewed Dec 18, 2021

View reviewed changes

sequencer.go Show resolved Hide resolved

fileseed.go Show resolved Hide resolved

assemble.go Outdated Show resolved Hide resolved

assemble.go Outdated Show resolved Hide resolved

assemble.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch from 3f9c4ef to fde373f Compare December 20, 2021 11:39

folbricht reviewed Dec 21, 2021

View reviewed changes

sequencer.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch from fde373f to c48c414 Compare December 21, 2021 11:17

folbricht reviewed Dec 25, 2021

View reviewed changes

assemble.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch 2 times, most recently from 12f18b6 to 9bd77d9 Compare December 28, 2021 11:42

folbricht reviewed Dec 28, 2021

View reviewed changes

assemble.go Outdated Show resolved Hide resolved

RyuzakiKK force-pushed the invalid_chunks branch from 9bd77d9 to 9545c56 Compare December 29, 2021 11:20

folbricht approved these changes Dec 30, 2021

View reviewed changes

folbricht merged commit a4c6fd2 into folbricht:master Dec 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to discard invalid seeds #203

Add option to discard invalid seeds #203

RyuzakiKK commented Dec 7, 2021

RyuzakiKK commented Dec 8, 2021 •

edited

RyuzakiKK commented Dec 8, 2021

folbricht left a comment

folbricht left a comment

folbricht Dec 12, 2021

RyuzakiKK Dec 14, 2021

folbricht Dec 15, 2021

RyuzakiKK Dec 15, 2021

RyuzakiKK Dec 21, 2021

folbricht Dec 21, 2021

folbricht Dec 15, 2021

folbricht commented Dec 15, 2021

folbricht left a comment

RyuzakiKK commented Dec 21, 2021

folbricht left a comment

folbricht commented Dec 30, 2021

Add option to discard invalid seeds #203

Add option to discard invalid seeds #203

Conversation

RyuzakiKK commented Dec 7, 2021

RyuzakiKK commented Dec 8, 2021 • edited

RyuzakiKK commented Dec 8, 2021

folbricht left a comment

Choose a reason for hiding this comment

folbricht left a comment

Choose a reason for hiding this comment

folbricht Dec 12, 2021

Choose a reason for hiding this comment

RyuzakiKK Dec 14, 2021

Choose a reason for hiding this comment

folbricht Dec 15, 2021

Choose a reason for hiding this comment

RyuzakiKK Dec 15, 2021

Choose a reason for hiding this comment

RyuzakiKK Dec 21, 2021

Choose a reason for hiding this comment

folbricht Dec 21, 2021

Choose a reason for hiding this comment

folbricht Dec 15, 2021

Choose a reason for hiding this comment

folbricht commented Dec 15, 2021

folbricht left a comment

Choose a reason for hiding this comment

RyuzakiKK commented Dec 21, 2021

folbricht left a comment

Choose a reason for hiding this comment

folbricht commented Dec 30, 2021

RyuzakiKK commented Dec 8, 2021 •

edited