-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry loading of corrupted data from backend / cache #4800
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MichaelEischer
force-pushed
the
cleanup-load
branch
from
May 9, 2024 21:58
a00785b
to
17e0d1a
Compare
This was referenced May 10, 2024
MichaelEischer
force-pushed
the
cleanup-load
branch
from
May 18, 2024 18:09
99bf392
to
6d58863
Compare
The helper is only intended for usage by backend implementations.
LoadRaw also includes improved context cancellation handling similar to the implementation in repository.LoadUnpacked. The removed cache backend test will be added again later on.
This replaces calling the low-level backend.Load() method.
Both functions were using a similar implementation.
LoadBlobsFromPack already implements the same fallback behavior.
This warning should already have been removed once the feature flag was dropped.
A file is always cached whole. Thus, any out of bounds access will also fail when directed at the backend. To handle case in which the cached file is broken, then caller must call Cache.Forget(h) for the file in question.
This is inspired by the circuit breaker pattern used for distributed systems. If too many requests fails, then it is better to immediately fail new requests for a limited time to give the backend time to recover. By only forgetting a file in the cache at most once, we can ensure that a broken file is only retrieved once again from the backend. If the file stored there is broken, previously it would be cached and deleted continuously. Now, it is retrieved only once again, all later requests just use the cached copy and either succeed or fail immediately.
This ensures that the pack header is actually read completely. Previously, for a truncated file it was possible to only read a part of the header, as backend.Load(...) is not guaranteed to return as many bytes as requested by the length parameter.
MichaelEischer
force-pushed
the
cleanup-load
branch
from
May 18, 2024 19:29
6d58863
to
74d9065
Compare
MichaelEischer
commented
May 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR change? What problem does it solve?
The PR cleans up how data is loaded from the backend and ensures that all places include a fallback to handle transient errors or a corrupt file in the cache.
The main component is
repository.LoadRaw
which replacesbackend.LoadAll
. In addition, almost all directbackend.Load()
uses outside the backend/cache code have been replaced with calls toLoadRaw
to benefit from its error handling. The only exceptions areLoadBlob
,ListPack
,checkPack
andLoadBlobsFromPack
. The first three include custom retry code to handle transient errors and theLoadBlobsFromPack
falls back toLoadBlob
.Direct uses of
Backend.Load()
outside the repository package are discouraged now and will eventually be no longer possible.The retry strategy from transient errors in general consists of explicitly forgetting the damaged file from the cache and retrying the operation once. The retries no longer happens as part of the retries performed by the RetryBackend but instead are handled separately. Thus, retries triggered by the
RetryBackend
now exclusively relate to backend / download errors. For example a pack file with corrupt blobs will now be downloaded at most twice.In conjunction with the just mentioned retry strategy, the cache no longer automatically removes a file if the processing callback failed. In particular, since #4605 the automatic removal only worked in few cases. Now, the cached broken file is passed to
cache.Forget(h)
explicitly byLoadRaw
etc.cache.Forget(h)
only deletes a file at most once during the runtime of restic. This ensures that if a file is corrupted at the backend, then it will only be downloaded and forgotten at most once.The cached file in the cache is now considered authoritative. Requesting a file section that is out of bounds of the cached file, now immediately yields an error. The retry strategy of
LoadRaw
etc. then forgets broken files. The main benefit of the new behavior is that out of bounds accesses of truncated files now fail immediately if the file is cacheable, instead of yielding endless retries.The changes are complemented with a few cleanups to reduce code duplication:
Was the change previously discussed in an issue or on the forum?
Prerequisite for #4784, see #4627.
Related to #4774.
Checklist
[ ] I have added documentation for relevant changes (in the manual).changelog/unreleased/
that describes the changes for our users (see template).gofmt
on the code in all commits.