Skip to content

Parent-snapshot detection fails with changing --files-from #2246

@BenWiederhake

Description

@BenWiederhake

Output of restic version

restic 0.9.4 compiled with go1.11.5 on linux/amd64

(debian testing, from the debian repository)

What should restic do differently? Which functionality do you think we should add?

In short: With an invocation like restic backup --files-from my-files-and-dirs.lst, restic could be more efficient about choosing a parent snapshot.

In long:

Whenever my-files-and-dirs.lst changes, no matter how slightly, restic apparently sees a different set of paths. Only slightly different, but still different. When searching for a parent snapshot during backup, this leads to unexpected behavior:

On the one hand, restic sees that the path set is different from anything seen ever before, and assumes a completely new backup. All data is re-uploaded files are scanned again, even though only changed data is uploaded:

repository 7b6b235d opened successfully, password is correct
Files:        1905 new,     0 changed,     0 unmodified
Dirs:            4 new,     0 changed,     0 unmodified
Added to the repo: 44.372 KiB
processed 1905 files, 346.776 MiB in 0:02
snapshot e50ef85d saved

On the other hand, I made only a small change in my-files-and-dirs.lst, so I expected that only the new files need to be uploaded.

I'm new to restic, so maybe I'm using it wrong. However, using tags does not seem to change automatic parent detection, and --parent latest does not seem to be supported. And I don't want to specify --parent 12345678 manually all the time, and would like to avoid fiddling with restic snapshots on my own.

I'm not sure which feature to propose. There are multiple things that might help:

  1. Allow --parent latest to use the latest snapshot, no matter what. This would be helpful for people like me, who only have one guest per repository anyway, but might result in other scenarios.
  2. Instead of 1., allow --parent latest-sametags to use the latest snapshot of the same tag set. This would avoid potential problems, and still cover most use cases.
  3. Automatic parent detection could try to find a close match in the previous few snapshots, and if it finds one, use that. As far as I can see, a false positive cannot have a bad impact, can it?

What are you trying to do?

I'm making snapshots of parts of my home directory, and have restrictions on the target repository size. So I only want to include specific things, like ~/workspace/, ~/bin/, ~/.bashrc, and so on. But not other things, like the gigantic folder of virtual machines, as the restic host has not enough space for that. Obviously, this list is subject to change.

With the current behavior, a simple backup without --parent 5eab0a7 makes the backup run a bit longer than necessary. Deduplication does its job perfectly, and no excess data is stored.

With the proposed behavior, no such delay would happen, or only when detection fails.

Did restic help you or made you happy in any way?

I'm currently using rsnapshot, and it works great for its use case. However, with so many small files, and some large files constantly changing (thunderbird is a strong offender), my home directory falls outside rsnapshot's use case. It seems restic is fast and space-efficient enough to cope with that much better. Hooray!


Just for the record: I no longer have this particular issue, since I personally can just avoid it. I don't close the issue because I recognize that other people do run into trouble due to this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions