-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring with exclude/include confusion #373
Comments
Hey, thanks for reaching out. Your description helps us a lot, it shows that we need to improve the documentation for the restore command. The include/exclude patterns are used to only restore matching files and directories from one snapshot. First of all: Unfortunately the restore process is not yet optimized and runs sequentially. Especially for remote repositories like s3 this means that the latency to the storage location adds up to a lot. Now to the concrete problem you have I see two solutions:
For the restore filter there are two modes. For restoring only a specific directory from the snapshot, using I'll give an example. Suppose you've created a snapshot of your home directory,
This means that internally, restic creates the top-level directory
With this include filter, restic will for each file/item check whether the pattern The easiest (and possibly also the fastest) would be to use the fuse mount. Did you try that yet? |
If you need more support, feel free to drop in the IRC channel, I'm available now. |
Added #374 to track the bug you discovered. |
Thanks @fd0, I took the FUSE mount route and it worked great thank you! In terms of optimizing restic for services like S3, did the owner(s) want to take restic down the route of specifically optimized repo modules depending on the service or did they want those repo modules to be a close abstraction of the backend interface from the restic core and have the core orchestrate most of the heavy lifting? For example, for the S3 repo backend, a bit like its restoration behavior, I believe its writing a single blob of any one pack and then in turn puts that single blob to the S3 bucket. Would there be a performance increase from queuing a number of blobs to be batch uploaded using the S3 multipart functionality? |
Then I declare your restore operation a success. If you agree then please close this issue. If you have an idea on how to improve the documentation, please create a new issue so we can track that. |
I'll respond here:
I think we'll try to make the backend implementations as thin as possible, and having the core take care of e.g. retries. Otherwise we'd need to implement that for every single backend, that'll probably lead to much code duplication.
I don't think that's right. The s3 backend (as any other backend) stores packs which consist of one or more blobs. Typically, there are between four and 2000 blobs within a single pack.
I think that's already optimized quite well on the blob/pack abstraction level, that should be enough. As I already wrote the restore process is not optimized at all, which means that blobs are fetched sequentially and not concurrently. In contrast, the backup process is already heavily optimized to write many packs in parallel. |
@fd0 - Thank you for clearing up some confusion, I'm loving what this project can offer. I'll close this as my issues have been dealt with. Best of luck! |
Thanks, and please continue to open issues for all things you come across that aren't clear ;) I'm glad you have your data back. |
I was wondering if someone could briefly explain the exclude/include functionality behind the snapshot restoration feature? I do not wish to annoy anyone by bringing this matter up as a GitHub issue, I am simply under a lot of pressure to complete a project, the project exists only in my remote S3 restic repo as yesterday my mac decided it would melt its logic board.
I made a backup of my whole system a couple days ago and today would like to restore only a specific directory from within the single snapshot.
First of all, is the exclude/include functionality used to restore specific directories/files from within a single snapshot or is it used to restore single snapshots containing those specific directories/files?
I have tried many restore commands and all seem to be running very slow for what they should be restoring.
The restoration processes are still running and thus I'm not 100% certain of the outcome but the time involved in completing the processes raises suspicion that they are not simply finding those patterns and restoring them to the target location.
I've reached the conclusion that either my command isn't doing what I expect it is doing, which is restoring just a specified directory. Alternatively, the issue is that the command is doing what I expected but not as directly as I expected. Maybe this is querying each element within the repo and restoring them as they match the pattern? The latter seems most likely to me due to the use of the language 'pattern' rather than 'path' in the include/exclude help documentation.
I apologies for not just looking through the design docs and source code to deduce the answer myself, I need the support of the GitHub community more than ever!
The text was updated successfully, but these errors were encountered: