-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prune: Avoid running the "find data that is still in use" step when not needed #812
Comments
You don't know how lucky you're. ;) My prune stage on personal repo runs about 15h. :( But, back to the subject. '0 duplicate blobs, 0B duplicate' is informative, but is not an estimate how much will be pruned. Finding packs to delete/rewrite happens later in "find data" stage, which is the slowest stage. From what I know, duplicate blobs can happen only if you simultaneously backup to the same repo. But, prunes main task is to find what you "forgotten" before, and unfortunately it takes time, cause the whole repo must be scanned. @fd0 promised to optimize it in the future. |
Thanks, @zcalusic, for explaining these details. I understand now things are not how I thought it was. Unless there are any reasons to keep this ticket open, it may be closed. |
Thanks @zcalusic for the correct explanation. This will be solved when I get around to implementing the local metadata cache, prune should be greatly sped up. In addition, I'm thinking about adding the list of referenced blobs by a snapshot to the local cache, so that restic runs only once per snapshot. I'm going to close this issue for now, #29 tracks implementing this metadata cache. |
PR #817 may also be interesting, it adds a |
Output of
restic version
This is a suggestion for an enhancement ... even though I am not 100% sure it is a valid case, but lets hear what you think.
When re-running a prune job on a repository which is already pruned, it still does does this:
Is it truly needed to run the 'find data that is still in use...' and further if '0 duplicate blobs, 0B duplicate' is the result of the preliminary check?
I can see that in some cases, such maintenance may make some sense, so I would suggest adding a
--force
argument to theprune
mode to keep the current behaviour.The reason for this request is that I have a script which runs regularly in the background when I log into my computer. The first thing it does is to run a 'restic prune' to do a clean-up at least once a day. Then a loop hits where it runs
restic forget
andrestic backup
at certain intervals throughout the day - until I log out and shutdown my computer. As therestic prune
job can easily take up to 30-45 minutes in my setup (even longer when I'm connected to a VPN), it would be great to speed up this pruning step when not strictly needed. Currently my script avoids theprune
step when I'm on the VPN/not at my local LAN; to reduce both CPU and network load.Any thoughts?
The text was updated successfully, but these errors were encountered: