New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize prune memory usage #3899
Conversation
f74d139
to
4cb8d49
Compare
4cb8d49
to
b933209
Compare
b933209
to
b0fa420
Compare
b0fa420
to
2364df2
Compare
2364df2
to
d87a940
Compare
This allows maintaining a usage counter for each blob.
The set covers necessary, existing and duplicate blobs. This removes the duplicate sets used to track whether all necessary blobs also exist. This reduces the memory usage of prune by about 20-30%.
As long as only a small fraction of the data in a repository is rewritten, the keepBlobs set will be rather small after cleaning it up. As golang maps do not shrink their memory usage, just copy the contents over to a new map. However, only copy the map if the cleanup removed at least half the entries.
d87a940
to
d966c52
Compare
I've rebased the PR, added a few more comments and made the sanity check at the end of |
LGTM. |
For what version of restic is this expected? I run 0.11.0 and I think it doesn't have this improvement, right? |
@molecular This PR was merged on Oct 22, 2022, thus it can't be part of restic 0.11.0 which was released on Nov 5, 2020. You need at least restic 0.15.0. |
@MichaelEischer, yeah, thanks. I wonder why the package isn't being updated in raspbian bullseye. Just ran a full system update, still version 0.11. Manualls installed 0.15.2 and giving it a spin. EDIT: it worked! Thanks a ton for that optimization! |
Debian (and also raspbian) usually only switch to a newer software version in conjunction with a new Debian release. |
What does this PR change? What problem does it solve?
It introduces a CountedBlobSet and uses it to track necessary, existing and duplicate blobs. This replaces the duplicate sets used to track whether all necessary blobs also exist. This reduces the memory usage of prune by about 20-30%. That set is also copied after removing blobs that are not modified by prune to reduce memory usage during the repack step.
The PR also contains an optimization for (Master)Index.Each to reduce the number of context switches.
Was the change previously discussed in an issue or on the forum?
No.
Checklist
[ ] I have added documentation for relevant changes (in the manual).changelog/unreleased/
that describes the changes for our users (see template).gofmt
on the code in all commits.