Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize prune memory usage #3899

Merged
merged 5 commits into from Oct 22, 2022

Conversation

MichaelEischer
Copy link
Member

What does this PR change? What problem does it solve?

It introduces a CountedBlobSet and uses it to track necessary, existing and duplicate blobs. This replaces the duplicate sets used to track whether all necessary blobs also exist. This reduces the memory usage of prune by about 20-30%. That set is also copied after removing blobs that are not modified by prune to reduce memory usage during the repack step.

The PR also contains an optimization for (Master)Index.Each to reduce the number of context switches.

Was the change previously discussed in an issue or on the forum?

No.

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • [ ] I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

This allows maintaining a usage counter for each blob.
The set covers necessary, existing and duplicate blobs. This removes the
duplicate sets used to track whether all necessary blobs also exist.
This reduces the memory usage of prune by about 20-30%.
As long as only a small fraction of the data in a repository is
rewritten, the keepBlobs set will be rather small after cleaning it up.
As golang maps do not shrink their memory usage, just copy the contents
over to a new map. However, only copy the map if the cleanup removed at
least half the entries.
@MichaelEischer
Copy link
Member Author

I've rebased the PR, added a few more comments and made the sanity check at the end of packInfoFromIndex a bit stricter.

@MichaelEischer
Copy link
Member Author

LGTM.

@molecular
Copy link

For what version of restic is this expected? I run 0.11.0 and I think it doesn't have this improvement, right?

@MichaelEischer
Copy link
Member Author

MichaelEischer commented Jun 6, 2023

@molecular This PR was merged on Oct 22, 2022, thus it can't be part of restic 0.11.0 which was released on Nov 5, 2020. You need at least restic 0.15.0.

@molecular
Copy link

molecular commented Jun 13, 2023

@MichaelEischer, yeah, thanks. I wonder why the package isn't being updated in raspbian bullseye. Just ran a full system update, still version 0.11.

Manualls installed 0.15.2 and giving it a spin.

EDIT: it worked! Thanks a ton for that optimization!

@MichaelEischer
Copy link
Member Author

I wonder why the package isn't being updated in raspbian bullseye. Just ran a full system update, still version 0.11.

Debian (and also raspbian) usually only switch to a newer software version in conjunction with a new Debian release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants