Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resumable prune #3806

Closed
pille opened this issue Jun 25, 2022 · 6 comments · Fixed by #4812
Closed

resumable prune #3806

pille opened this issue Jun 25, 2022 · 6 comments · Fixed by #4812
Labels

Comments

@pille
Copy link

pille commented Jun 25, 2022

Output of restic version

restic 0.13.1 compiled with go1.18 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

pruning can take a long time, especially if there's a lot to repack and upload.
an interrupted prune is currently not resumable, as it removes unreferenced packs first.
in this case those packs contain the repacked blobs that should be kept after a prune.
they get deleted and then rebuild when pruning again.

if this deletion would not occur at the beginning, the already uploaded blobs would probably be detected and used instead of being rebuilt and reuploaded.

moving the deletion of replaced packs to the end would probably need some additional changes, because the same blob would exist in two packs. so maybe once it's clear which packs to remove (after the beginning scan) it should mark them as obsolete, and their blobs should not be referenced until they are deleted deferred.

What are you trying to do? What problem would this solve?

see above.
prunes will be resumable.
as they may be long-running due to lots of reuploads you're forced to chose between periodical prunes blocking actual backups.
also connection-issues will have less impact.

maybe this will even get rid of the prune locks (by marking packs obsolete and removing them eventually)

Did restic help you today? Did it make you happy in any way?

i'm using restic every day and am happy with its reliability.
i'll be more happy, when it works on its own.

@aawsome
Copy link
Contributor

aawsome commented Jun 28, 2022

@aawsome
Copy link
Contributor

aawsome commented Jun 28, 2022

In order to solve this, IMO we have to add #3290 and save processed packs in preliminary index files during prune.

This is how it is implemented in rustic, see e.g.
https://github.com/rustic-rs/rustic/blob/main/src/commands/prune.rs

If you want to try out how prune works with these two features, feel free to try out rustic. But note that rustic prune does a two-phase prune, i.e. instead of removing packs it first marks them for removal and only removes them on a following prune run if they have been marked long enough.

@JsBergbau
Copy link
Contributor

Just as a hint. Currently I'm pruning / converting a 9 TB repo to new compression format. Since the HDD is not big enough to prune it all in one I use this simple code

#!/bin/bash

for i in {1..10}
do
./restic prune --repack-uncompressed --compression max --no-cache --max-repack-size 1099511627776
# 1TB 1099511627776
done

You can lower the repack size and increase the loop counter. So you won't lose prune results in case of interruption. Of course it is not exactly what you suggtest but still for me it suffices.

@Zaxim
Copy link

Zaxim commented Apr 28, 2023

I've run into this issue recently as I compress and repack a 12TB repo. Because of the lack of prune resume, I've been using --max-repack-size 750gb and just manually running it after it finishes. For some reason the VPS I'm using keeps going down, so twice now I've lost hundreds of GBs of uploads that needed to be reuploaded, which unfortunately maxes out at 5MBps upload for me. The solution is obviously lower my --max-repack-size but it was a painful gotcha.

@aawsome
Copy link
Contributor

aawsome commented Apr 28, 2023

@Zaxim The main "problem" of restic not having a resumable prune is that the new index is only stored after all pack files are written. It you rerun an aborted prune, all already-written packfiles are therefore treated as additional and unused and therefore removed again.

So, if your prune run aborts and you won't lose the already-created packfiles, you must manually run rebuild-index to add the already-created packfiles to the index. Then the next prune run recognizes them and thanks to #3290 (IIRC) it will choose to keep the pack files generated by the last prune run and remove the old and already-repacked pack files.

@Zaxim
Copy link

Zaxim commented Apr 28, 2023

Sweet! That's good to know that I can just run rebuild-index if it happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants