Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for --prefix and --strip-components options on backup #2010

Open
wants to merge 16 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@giacomocariello
Copy link

giacomocariello commented Sep 25, 2018

What is the purpose of this change? What does it change?

Adds --prefix and --strip-components to "restic backup" command. These options allow the user to manipulate the root directory paths saved in the snapshot.

Was the change discussed in an issue or in the forum before?

It was previously discussed in issue #555

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 25, 2018

Codecov Report

Merging #2010 into master will decrease coverage by 4.81%.
The diff coverage is 72.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2010      +/-   ##
==========================================
- Coverage   50.03%   45.21%   -4.82%     
==========================================
  Files         169      165       -4     
  Lines       13563    13538      -25     
==========================================
- Hits         6786     6121     -665     
- Misses       5759     6467     +708     
+ Partials     1018      950      -68
Impacted Files Coverage Δ
internal/archiver/archiver.go 68.62% <100%> (ø) ⬆️
internal/restic/testing.go 79.8% <100%> (ø) ⬆️
cmd/restic/cmd_backup.go 45.28% <100%> (+0.8%) ⬆️
internal/restic/snapshot.go 48.71% <65.21%> (+2.25%) ⬆️
internal/backend/b2/b2.go 0% <0%> (-78.95%) ⬇️
internal/backend/swift/swift.go 0% <0%> (-78.45%) ⬇️
internal/backend/gs/gs.go 0% <0%> (-73.61%) ⬇️
internal/backend/azure/azure.go 0% <0%> (-69.46%) ⬇️
internal/backend/swift/config.go 36.95% <0%> (-54.35%) ⬇️
cmd/restic/cmd_mount.go 12.98% <0%> (-44.16%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6901ff...43d1a2f. Read the comment docs.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Sep 25, 2018

Hey, thank you very much for giving this a try! I don't have time to look at the code today, but in order for it to get merged eventually, it will need extensive tests in the archiver package, here: https://github.com/restic/restic/blob/master/internal/archiver/archiver_test.go

In my experience, adding such tests reveals many subtle bugs, so maybe you'll find the one you're looking for while adding tests :)

Also, there are a few minor issues with the commits:

  • There's an odd merge commit in this PR which needs to be removed
  • Use present tense ("add" instead of "added", describe what the commit does)
  • Shorten the subject lines
  • The Minor Fix commit should be merged into the commit it corrects
  • Don't change the man pages, they are auto-generated via restic generate, no need to check this in (so you can drop the commits)

Thanks!

@giacomocariello giacomocariello force-pushed the gstruct:prefix_strip_feat branch 5 times, most recently from 101fbef to da727f0 Sep 25, 2018

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 25, 2018

Okay, I did some cleanup of the commits to solve the issues you pointed out. Regarding the tests, I need some time to figure out what's expected. I posted this initial draft mainly to make sure that the way I conceived it is acceptable from your point of view.

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 25, 2018

Okay, I think I figured out the problem: I need to replicate target mangling behavior in internal/restic function FindLatestSnapshot(), otherwise it won't match parent and behave as if new snapshot were an orphan.

@giacomocariello giacomocariello force-pushed the gstruct:prefix_strip_feat branch 2 times, most recently from 7ffbae2 to 43d1a2f Sep 25, 2018

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 25, 2018

Okay, the problem I mentioned in previous comments is solved. Now it's only a matter of writing the tests.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Sep 26, 2018

Do you need some guidance on that? I could draft the test scaffolding or so...

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 26, 2018

That would be great! I was thinking of extending TestArchiverSnapshot with prefix and strip fields, but certainly an expert drafting would really help.

@fd0
Copy link
Member

fd0 left a comment

A few comments I noticed while reading the code. Apparently, the code does not work (yet), did you know that?

$ echo $PWD
/home/fd0/shared/work/go/src/github.com/restic/restic/cmd/restic

$ ./restic backup $PWD
[...]

$ ./restic ls -l latest
snapshot a11d3cd9 of [/home/fd0/shared/work/go/src/github.com/restic/restic/cmd/restic] filtered by [] at 2018-09-26 14:30:50.723768212 +0200 CEST):
drwxr-xr-x     0     0      0 2016-11-25 18:24:06 /home
drwxr-xr-x  1000  1000      0 2018-09-26 14:27:52 /home/fd0
drwxr--r--  1000   100      0 2018-08-30 22:04:44 /home/fd0/shared
drwxr-xr-x  1000   100      0 2017-08-06 17:37:27 /home/fd0/shared/work

$ ./restic backup --strip-components 2 $PWD
[...]

$ ./restic ls -l latest | head -n 5
snapshot f576eda7 of [/home/fd0/shared/work/go/src/github.com/restic/restic/cmd/restic] filtered by [] at 2018-09-26 14:31:41.992084043 +0200 CEST):
drwxr-xr-x     0     0      0 2016-11-25 18:24:06 /home
drwxr-xr-x  1000  1000      0 2018-09-26 14:27:52 /home/fd0
drwxr--r--  1000   100      0 2018-08-30 22:04:44 /home/fd0/shared
drwxr-xr-x  1000   100      0 2017-08-06 17:37:27 /home/fd0/shared/work

It's the same for --prefix.

Show resolved Hide resolved cmd/restic/cmd_backup.go Outdated
Show resolved Hide resolved internal/restic/snapshot.go
return
}

func pathMangle(path string, prefix string, strip int) string {

This comment has been minimized.

Copy link
@fd0

fd0 Sep 26, 2018

Member

We need to find a better name for this function, and it also needs extensive tests. What about pathStripPrefix?

This comment has been minimized.

Copy link
@giacomocariello

giacomocariello Sep 26, 2018

Author

pathStripPrefix is okay, I'll rename it, however it's also responsible for making the path absolute where expected.

Show resolved Hide resolved internal/restic/snapshot.go
Show resolved Hide resolved internal/restic/snapshot_find.go
Show resolved Hide resolved cmd/restic/cmd_backup.go
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Sep 26, 2018

Ok, I pushed a commit with test scaffolds and a few quick test cases I came up with, some comments:

  • TestPathSplit() fails for a few cases because I could not find out what the function does exactly. Can you please add documentation comment to it? :)
  • TestPathMangle() panics for one of the test cases :)
  • Please rename TestPathMangle() together with the function.
@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 26, 2018

Okay, I'll answer the specs question now and do the actual fixing/coding tonight.

@giacomocariello giacomocariello force-pushed the gstruct:prefix_strip_feat branch from d1a830b to 64348de Sep 26, 2018

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Sep 26, 2018

I have some problem with TestArchiverSnapshotPrefixStrip: it's complaining that the internal structure of the snapshot filesystem doesn't contain prefix and/or is not stripped, but file paths aren't supposed to change, only root directory field. Am I missing anything?

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 16, 2018

I have some problem with TestArchiverSnapshotPrefixStrip: it's complaining that the internal structure of the snapshot filesystem doesn't contain prefix and/or is not stripped, but file paths aren't supposed to change, only root directory field. Am I missing anything?

I think we need to talk about what this change is about. There are two places in which the path to a file/dir to backup is relevant:

  • The path recorded in the snapshot (used to detect if there has been a backup of exactly this path before)
  • The structure within the snapshot

Maybe we have a misunderstanding here: I was talking about the second point, --prefix and --strip-components should modify the structure within the snapshot. Did you mabye understand something different? :)

Eventually we will need to address both, but reworking the logic for the first item is even more complex.

@@ -341,3 +341,25 @@ func TestEnsureSnapshot(t testing.TB, repo restic.Repository, snapshotID restic.

TestEnsureTree(ctx, t, "/", repo, *sn.Tree, dir)
}

// TestEnsureSnapshotPath tests if the snapshot in the repo has exactly the same

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Oct 16, 2018

comment on exported function TestEnsureSnapshotPaths should be of the form "TestEnsureSnapshotPaths ..."

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Oct 16, 2018

In my PR, I'm not changing the structure of the snapshot, only the recorded path. The rationale for this solution is to allow different "source" paths to look equal upon backup, in order to make parent selection work as if the backup were from the same source path.
Specifically, this solves the problem of creating backups from ZFS daily snapshots, where the root folder of the backup is /mountpoint/.zfs/snapshot/snapshot_name but we want to record the path as /mountpoint.
IMHO, there's no need for a solution that changes the structure of the snapshot itself, as the recovery path can be specified to include a prefix upon restore.

@giacomocariello giacomocariello force-pushed the gstruct:prefix_strip_feat branch from 4b59c84 to 910b9dd Oct 16, 2018

@giacomocariello giacomocariello force-pushed the gstruct:prefix_strip_feat branch from 596509c to 8a8e53d Oct 16, 2018

@grawity

This comment has been minimized.

Copy link

grawity commented Oct 16, 2018

Thanks for this feature. I use restic on a fileserver, and ever since trying 0.9.x I wished I could avoid its storage of absolute paths completely (just as 0.8.x used to work). The ability to --strip-components at backup time will be very useful, and IMHO it would be a great addition to the restore subcommand as well.

(For example, today I'm backing up D:\Shares\Users\, some months later it might be migrated to E:\Users\ or even \\anotherserver\Users\, so I wouldn't want to store anything more than just the relative path Users\. More than that, when the day comes to restore some data, the destination might be mounted somewhere else entirely...)

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 16, 2018

@grawity This feature is only slightly related to what you're trying to do, as this PR won't change the paths the data is stored at in a snapshot. To do what you want, you can just pass restic a relative path:

D:
cd \Shares
restic backup Users

The top-level directory in the next snapshot will be Users, and you can restore this as you wish.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 16, 2018

In my PR, I'm not changing the structure of the snapshot, only the recorded path. The rationale for this solution is to allow different "source" paths to look equal upon backup, in order to make parent selection work as if the backup were from the same source path.

Ah, I see. Then we did talk about different things. The problem is that it's only half of the story.

Let's say I have a file called foo.txt in a zfs volume called bigdata. I make a zfs snapshot of the volume, which is the mounted at /mnt/bigdata/zfs-snapshot-20181016. I can call restic in several ways to make a backup of the snapshot, this varies at which path the file foo.txt in the snapshot is stored:

restic path to foo.txt
restic backup /mnt/bigdata/zfs-snapshot-20181016 /mnt/bigdata/zfs-snapshot-20181016/foo.txt
cd /mnt/bigdata; restic backup zfs-snapshot-20181016 /zfs-snapshot-20181016/foo.txt
cd /mnt/bigdata/zfs-snapshot-20181016; restic backup . /foo.txt

For each of these snapshots, restic records the (same) absolute path of the content in the snapshot file in the repo. So if I were to change the call to restic (e.g. from the first to the second row in the table), restic will find a parent snapshot, but it will be useless because the path within then snapshot has changed. That's not a big deal, but the files will be re-read. The parent detection code is too simple to recognize this, and we don't record the current working directory for restic (yet).

So, let's address stripping and/or prefixing the paths recorded in the snapshot in this PR, and tackle the rest later. I think you need to remove some of my test cases then, all those which test the internal structure of the snapshot.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 16, 2018

Hm. Thinking about this, what do we do in the following case:

restic backup --strip-components 2 /home/user/work /home/other/work

This would produce /work, /work as the paths. Is that a good thing to have?

We also need to come up with different names for the command-line option --path-components, because this option is used by tar and operates on the content, so I think we should not use the same name for restic which does not operate on the content.

To be honest, I'm not sure if this is the right approach to "fix" parent detection.

@giacomocariello

This comment has been minimized.

Copy link
Author

giacomocariello commented Oct 16, 2018

@fd0 regarding your comment on foo.txt, when the path is relative, the cwd becomes the root, which seems reasonable to me and similar to tar (if you wanted to mimic tar, we could add -C to chdir before listing).

Regarding multiple targets with the same final path, while the two root paths are recorded as the same path, the actual file paths are not stripped, so they wouldn't collide. However, that doesn't seem particularly useful. Probably, using relative paths with --prefix and eventually -C (see above) would be preferred.
On the other side, --strip-components would probably help on recover command (as is on tar extract), to recover subpaths without the whole file hierarchy, which would require a later mv.

Regarding the testing, I changed your snapshot tree contents test to only check for snapshot paths, but it's somewhat a duplicate of the other test. Please let me know if you'd like to have it zapped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.