Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring with exclude/include confusion #373

Closed
lbennett-stacki opened this issue Dec 14, 2015 · 8 comments
Closed

Restoring with exclude/include confusion #373

lbennett-stacki opened this issue Dec 14, 2015 · 8 comments
Labels
type: question/problem usage questions or problem reports

Comments

@lbennett-stacki
Copy link

I was wondering if someone could briefly explain the exclude/include functionality behind the snapshot restoration feature? I do not wish to annoy anyone by bringing this matter up as a GitHub issue, I am simply under a lot of pressure to complete a project, the project exists only in my remote S3 restic repo as yesterday my mac decided it would melt its logic board.

I made a backup of my whole system a couple days ago and today would like to restore only a specific directory from within the single snapshot.

First of all, is the exclude/include functionality used to restore specific directories/files from within a single snapshot or is it used to restore single snapshots containing those specific directories/files?

I have tried many restore commands and all seem to be running very slow for what they should be restoring.

The restoration processes are still running and thus I'm not 100% certain of the outcome but the time involved in completing the processes raises suspicion that they are not simply finding those patterns and restoring them to the target location.

I've reached the conclusion that either my command isn't doing what I expect it is doing, which is restoring just a specified directory. Alternatively, the issue is that the command is doing what I expected but not as directly as I expected. Maybe this is querying each element within the repo and restoring them as they match the pattern? The latter seems most likely to me due to the use of the language 'pattern' rather than 'path' in the include/exclude help documentation.

I apologies for not just looking through the design docs and source code to deduce the answer myself, I need the support of the GitHub community more than ever!

@fd0
Copy link
Member

fd0 commented Dec 14, 2015

Hey, thanks for reaching out. Your description helps us a lot, it shows that we need to improve the documentation for the restore command.

The include/exclude patterns are used to only restore matching files and directories from one snapshot.

First of all: Unfortunately the restore process is not yet optimized and runs sequentially. Especially for remote repositories like s3 this means that the latency to the storage location adds up to a lot.

Now to the concrete problem you have I see two solutions:

  • With the restic restore filters, you can limit the restore to a directory by using --include
  • With the fuse mount you could mount the repository to a local directory and browse the snapshots, you can then use cp or rsync or even the finder to copy data out.

For the restore filter there are two modes. For restoring only a specific directory from the snapshot, using --include once is sufficient.

I'll give an example. Suppose you've created a snapshot of your home directory, /home/user, by running:

$ restic backup /home/user

This means that internally, restic creates the top-level directory user and put files in there. You can check that by running restic ls $SNAPSHOT_ID. If you want to only restore everything below /home/user/work/web/jekyll to /tmp/restore, you can run restic as follows:

$ restic restore --target /tmp/restore --include "web/jekyll" $SNAPSHOT_ID

With this include filter, restic will for each file/item check whether the pattern web/jekyll matches somewhere. I've just discovered that specifying an absolute path as an include pattern currently does not work and restores nothing, I'll add an issue about it.

The easiest (and possibly also the fastest) would be to use the fuse mount. Did you try that yet?

@fd0
Copy link
Member

fd0 commented Dec 14, 2015

If you need more support, feel free to drop in the IRC channel, I'm available now.

@fd0 fd0 added the type: question/problem usage questions or problem reports label Dec 14, 2015
@fd0
Copy link
Member

fd0 commented Dec 14, 2015

Added #374 to track the bug you discovered.

@lbennett-stacki
Copy link
Author

Thanks @fd0, I took the FUSE mount route and it worked great thank you!

In terms of optimizing restic for services like S3, did the owner(s) want to take restic down the route of specifically optimized repo modules depending on the service or did they want those repo modules to be a close abstraction of the backend interface from the restic core and have the core orchestrate most of the heavy lifting?

For example, for the S3 repo backend, a bit like its restoration behavior, I believe its writing a single blob of any one pack and then in turn puts that single blob to the S3 bucket. Would there be a performance increase from queuing a number of blobs to be batch uploaded using the S3 multipart functionality?

@fd0
Copy link
Member

fd0 commented Dec 15, 2015

Then I declare your restore operation a success. If you agree then please close this issue. If you have an idea on how to improve the documentation, please create a new issue so we can track that.

@fd0
Copy link
Member

fd0 commented Dec 15, 2015

I'll respond here:

In terms of optimizing restic for services like S3, did the owner(s) want to take restic down the route of specifically optimized repo modules depending on the service or did they want those repo modules to be a close abstraction of the backend interface from the restic core and have the core orchestrate most of the heavy lifting?

I think we'll try to make the backend implementations as thin as possible, and having the core take care of e.g. retries. Otherwise we'd need to implement that for every single backend, that'll probably lead to much code duplication.

For example, for the S3 repo backend, a bit like its restoration behavior, I believe its writing a single blob of any one pack and then in turn puts that single blob to the S3 bucket.

I don't think that's right. The s3 backend (as any other backend) stores packs which consist of one or more blobs. Typically, there are between four and 2000 blobs within a single pack.

Would there be a performance increase from queuing a number of blobs to be batch uploaded using the S3 multipart functionality?

I think that's already optimized quite well on the blob/pack abstraction level, that should be enough.

As I already wrote the restore process is not optimized at all, which means that blobs are fetched sequentially and not concurrently. In contrast, the backup process is already heavily optimized to write many packs in parallel.

@lbennett-stacki
Copy link
Author

@fd0 - Thank you for clearing up some confusion, I'm loving what this project can offer. I'll close this as my issues have been dealt with. Best of luck!

@fd0
Copy link
Member

fd0 commented Dec 15, 2015

Thanks, and please continue to open issues for all things you come across that aren't clear ;)

I'm glad you have your data back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question/problem usage questions or problem reports
Projects
None yet
Development

No branches or pull requests

2 participants