Restoring with exclude/include confusion #373

lbennett-stacki · 2015-12-14T17:37:05Z

I was wondering if someone could briefly explain the exclude/include functionality behind the snapshot restoration feature? I do not wish to annoy anyone by bringing this matter up as a GitHub issue, I am simply under a lot of pressure to complete a project, the project exists only in my remote S3 restic repo as yesterday my mac decided it would melt its logic board.

I made a backup of my whole system a couple days ago and today would like to restore only a specific directory from within the single snapshot.

First of all, is the exclude/include functionality used to restore specific directories/files from within a single snapshot or is it used to restore single snapshots containing those specific directories/files?

I have tried many restore commands and all seem to be running very slow for what they should be restoring.

The restoration processes are still running and thus I'm not 100% certain of the outcome but the time involved in completing the processes raises suspicion that they are not simply finding those patterns and restoring them to the target location.

I've reached the conclusion that either my command isn't doing what I expect it is doing, which is restoring just a specified directory. Alternatively, the issue is that the command is doing what I expected but not as directly as I expected. Maybe this is querying each element within the repo and restoring them as they match the pattern? The latter seems most likely to me due to the use of the language 'pattern' rather than 'path' in the include/exclude help documentation.

I apologies for not just looking through the design docs and source code to deduce the answer myself, I need the support of the GitHub community more than ever!

fd0 · 2015-12-14T19:44:24Z

Hey, thanks for reaching out. Your description helps us a lot, it shows that we need to improve the documentation for the restore command.

The include/exclude patterns are used to only restore matching files and directories from one snapshot.

First of all: Unfortunately the restore process is not yet optimized and runs sequentially. Especially for remote repositories like s3 this means that the latency to the storage location adds up to a lot.

Now to the concrete problem you have I see two solutions:

With the restic restore filters, you can limit the restore to a directory by using --include
With the fuse mount you could mount the repository to a local directory and browse the snapshots, you can then use cp or rsync or even the finder to copy data out.

For the restore filter there are two modes. For restoring only a specific directory from the snapshot, using --include once is sufficient.

I'll give an example. Suppose you've created a snapshot of your home directory, /home/user, by running:

$ restic backup /home/user

This means that internally, restic creates the top-level directory user and put files in there. You can check that by running restic ls $SNAPSHOT_ID. If you want to only restore everything below /home/user/work/web/jekyll to /tmp/restore, you can run restic as follows:

$ restic restore --target /tmp/restore --include "web/jekyll" $SNAPSHOT_ID

With this include filter, restic will for each file/item check whether the pattern web/jekyll matches somewhere. I've just discovered that specifying an absolute path as an include pattern currently does not work and restores nothing, I'll add an issue about it.

The easiest (and possibly also the fastest) would be to use the fuse mount. Did you try that yet?

fd0 · 2015-12-14T19:45:09Z

If you need more support, feel free to drop in the IRC channel, I'm available now.

fd0 · 2015-12-14T20:06:43Z

Added #374 to track the bug you discovered.

lbennett-stacki · 2015-12-15T01:34:26Z

Thanks @fd0, I took the FUSE mount route and it worked great thank you!

In terms of optimizing restic for services like S3, did the owner(s) want to take restic down the route of specifically optimized repo modules depending on the service or did they want those repo modules to be a close abstraction of the backend interface from the restic core and have the core orchestrate most of the heavy lifting?

For example, for the S3 repo backend, a bit like its restoration behavior, I believe its writing a single blob of any one pack and then in turn puts that single blob to the S3 bucket. Would there be a performance increase from queuing a number of blobs to be batch uploaded using the S3 multipart functionality?

fd0 · 2015-12-15T09:32:18Z

Then I declare your restore operation a success. If you agree then please close this issue. If you have an idea on how to improve the documentation, please create a new issue so we can track that.

fd0 · 2015-12-15T09:37:18Z

I'll respond here:

In terms of optimizing restic for services like S3, did the owner(s) want to take restic down the route of specifically optimized repo modules depending on the service or did they want those repo modules to be a close abstraction of the backend interface from the restic core and have the core orchestrate most of the heavy lifting?

I think we'll try to make the backend implementations as thin as possible, and having the core take care of e.g. retries. Otherwise we'd need to implement that for every single backend, that'll probably lead to much code duplication.

For example, for the S3 repo backend, a bit like its restoration behavior, I believe its writing a single blob of any one pack and then in turn puts that single blob to the S3 bucket.

I don't think that's right. The s3 backend (as any other backend) stores packs which consist of one or more blobs. Typically, there are between four and 2000 blobs within a single pack.

Would there be a performance increase from queuing a number of blobs to be batch uploaded using the S3 multipart functionality?

I think that's already optimized quite well on the blob/pack abstraction level, that should be enough.

As I already wrote the restore process is not optimized at all, which means that blobs are fetched sequentially and not concurrently. In contrast, the backup process is already heavily optimized to write many packs in parallel.

lbennett-stacki · 2015-12-15T10:53:37Z

@fd0 - Thank you for clearing up some confusion, I'm loving what this project can offer. I'll close this as my issues have been dealt with. Best of luck!

fd0 · 2015-12-15T20:14:09Z

Thanks, and please continue to open issues for all things you come across that aren't clear ;)

I'm glad you have your data back.

fd0 added the type: question/problem usage questions or problem reports label Dec 14, 2015

lbennett-stacki closed this as completed Dec 15, 2015

fd0 mentioned this issue Jan 25, 2016

Manual: Document include/exclude examples for backup and and restore #396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring with exclude/include confusion #373

Restoring with exclude/include confusion #373

lbennett-stacki commented Dec 14, 2015

fd0 commented Dec 14, 2015

fd0 commented Dec 14, 2015

fd0 commented Dec 14, 2015

lbennett-stacki commented Dec 15, 2015

fd0 commented Dec 15, 2015

fd0 commented Dec 15, 2015

lbennett-stacki commented Dec 15, 2015

fd0 commented Dec 15, 2015

Restoring with exclude/include confusion #373

Restoring with exclude/include confusion #373

Comments

lbennett-stacki commented Dec 14, 2015

fd0 commented Dec 14, 2015

fd0 commented Dec 14, 2015

fd0 commented Dec 14, 2015

lbennett-stacki commented Dec 15, 2015

fd0 commented Dec 15, 2015

fd0 commented Dec 15, 2015

lbennett-stacki commented Dec 15, 2015

fd0 commented Dec 15, 2015