Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to dump folders to tar via stdout #2124

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@Kidswiss
Copy link

Kidswiss commented Dec 24, 2018

What is the purpose of this change? What does it change?

This enhances the dump command so that it will recognise if it was called on a file or a folder.

  • If it's a file it will just dump its contents to stdout as before

  • if it's a folder it will walk through the tree with the provided folder as its root. For each file it finds in the tree it adds it to the tar which in turn gets dumped to the stdout.

This implementation is a POC to test the viability and the speed of this solution, so it's not finished in terms of what it restores:

  • currently no xattrs
  • currently no links

I tested this with a repository consisting of about 500 files and about 1.3Gb data. The speed compared to a normal restore is practically the same (even a small bit faster). And I'd like this pull request to serve as a discussion ground to finish this cleanly.

Was the change discussed in an issue or in the forum before?

This is the merge request for issue #2123.
Closes #2123

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Dec 24, 2018

Codecov Report

Merging #2124 into master will decrease coverage by 4.34%.
The diff coverage is 35.33%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2124      +/-   ##
=========================================
- Coverage   51.05%   46.7%   -4.35%     
=========================================
  Files         176     173       -3     
  Lines       14348   14442      +94     
=========================================
- Hits         7325    6745     -580     
- Misses       5951    6674     +723     
+ Partials     1072    1023      -49
Impacted Files Coverage Δ
cmd/restic/cmd_dump.go 3.89% <0%> (-2.49%) ⬇️
cmd/restic/acl.go 76.81% <76.81%> (ø)
internal/backend/b2/b2.go 0% <0%> (-80.69%) ⬇️
internal/backend/swift/swift.go 0% <0%> (-78.83%) ⬇️
internal/backend/gs/gs.go 0% <0%> (-74%) ⬇️
internal/backend/azure/azure.go 0% <0%> (-69.46%) ⬇️
internal/backend/swift/config.go 34.69% <0%> (-57.15%) ⬇️
internal/archiver/file_saver.go 81.57% <0%> (-7.02%) ⬇️
internal/archiver/blob_saver.go 95.06% <0%> (-4.94%) ⬇️
internal/backend/s3/s3.go 59.39% <0%> (-0.76%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecc2458...76aee0a. Read the comment docs.

@Kidswiss Kidswiss changed the title Tar Ability to tar a whole folder to stdout Dec 24, 2018

@Kidswiss Kidswiss changed the title Ability to tar a whole folder to stdout Ability to dump folders to tar via stdout Dec 24, 2018

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch from 1a69c3f to 8d85723 Jan 14, 2019

@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Jan 14, 2019

The build is currently failing due to the method I use to write the xattrs to the tar file. There are two ways to handle the xattrs: https://golang.org/pkg/archive/tar/#Header

Xattrs map[string]string // Go 1.3

and

//  h.Xattrs[key] = value
//  h.PAXRecords["SCHILY.xattr."+key] = value
PAXRecords map[string]string // Go 1.10

As the Xattrs map is deprecated I used PAXRecords which is only available in go 1.10+. So the build for go 1.9 fails.

Any suggestions how I should handle this? One possibility would be to use build tags and implement both versions (not a fan of this but if there's no other way). Also just for curiosity: why is there still a build that uses go 1.9? Is this for some backward compatibility?

@fd0
Copy link
Member

fd0 left a comment

Hey, thank you very much for your contribution, this function is great! Please accept my apologies for not reviewing this earlier, it's a rough time right now for me (at least concerning "spare time for restic in the evenings") ;)

As for your question: we try to support older versions of Go mainly so that people on Debian stable can build it. At the moment, the latest version for it available via the backports repo is 1.11, so I don't think there's a need to support Go 1.9 any more. You have my approval to remove Go 1.9 from the CI tests, but please mention it in the changelog file. Let me know if you need help with that!

A few issues I noticed while reviewing it:

  • Dumping a whole snapshot via restic dump latest / does not work, it prints: cannot dump file: path "" not found in snapshot
  • I noticed that the current code does not include any information about directories in the tar, is that intentional?
  • Something isn't quite right with extended attributes, like ACL, my test case:
mkdir test
echo foo > test/file
setfacl -m user:nobody:rwx test/file
chmod 700 test

Save this in a repository, then run tar on it gives:

./restic dump latest /test | tar --acls -tvp
-rw-rwxr--  1000/100          4 2019-02-23 11:47 test/file

It looks like there's no information about the ACL in it... Can you have a look? :)

Show resolved Hide resolved cmd/restic/cmd_dump.go Outdated
@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Feb 26, 2019

Hey

Thank you very much for the feedback. Keep hanging in there, you get through that rough time :)

  • Dumping a whole snapshot via restic dump latest / does not work, it prints: cannot dump file: path "" not found in snapshot

I thought I fixed that one already... Maybe I forgot to push the commit. I'll have a look again.

  • I noticed that the current code does not include any information about directories in the tar, is that intentional?

You mean like mtime,ctime etc? To be honest I didn't really think about adding them. Good point, thx.

  • Something isn't quite right with extended attributes, like ACL, my test case:

I'll have a look at that. I developed and tested the feature under OSX where the whole tar/xattr story is somewhat weird and broken. I'll try to replicate your test and fix this under Linux.

Best Regards
Simon

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch from 8d85723 to 091687b Mar 1, 2019

Show resolved Hide resolved cmd/restic/acl.go Outdated
Show resolved Hide resolved cmd/restic/acl.go Outdated
@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Mar 1, 2019

I found the issue with the ACLs and tar. ACLs have to be encoded completely different for a tar, even though they are technically saved as an xattr. The ACLs are saved in some binary format in the xattrs.

Examples:

# Get the files ACLs
vagrant@golang:~$ getfacl backup/main.go
# file: backup.old/main.go
# owner: vagrant
# group: vagrant
user::rw-
user:root:rwx
user:nobody:rwx
group::rwx
mask::rwx
other::r--

# Get the files xattrs
vagrant@golang:~$ getfattr -d -m - backup/main.go 
# file: backup.old/main.go
system.posix_acl_access=0sAgAAAAEABgD/////AgAHAAAAAAACAAcA/v8AAAQABwD/////EAAHAP////8gAAQA/////w==
user.asd="bar"
user.foo="bar" 

# Tar header tar --create --xattrs  --acls test.tar backup/main.go
vagrant@golang:~$ cat test.tar
backup.old/PaxHeaders.6059/main.go0000644000000000000000000000047113436167053014051 xustar0030 mtime=1551429163.710267245
30 atime=1551430659.857967244
30 ctime=1551430766.967495243
# Correctly encoded ACLs in the tar header
80 SCHILY.acl.access=user::rw-
user:nobody:rwx
group::rwx
mask::rwx
other::r--
# Normal xattrs
29 SCHILY.xattr.user.foo=bar
29 SCHILY.xattr.user.asd=bar
# Even though the ACLs were written above the tar command adds a garbled mess here like my implementation...
85 SCHILY.xattr.system.posix_acl_access=?????????????? ????
backup.old/main.go0000674000175000017500000000055213436167053015556 0ustar00vagrantvagrant00000000000000package main
<SNIP>

While researching about this encoding I found this implementation in go: https://github.com/maxymania/go-system/blob/master/posix_acl/posix_acl.go. Unfortunately this is actually the only information I was able to find how that encoding actually works. It might be burried somewhere deep inside some POSIX specification.

I added a modified version of that code to Restic which does actually write the correct ACL headers:

vagrant@golang:/repos/restic/cmd/restic$ ./restic dump 1bb70a9b /home/vagrant/backup | tar --acls --xattrs -tvvp
-rw-rwxr--+ 1000/1000       362 2019-03-01 08:32 home/vagrant/backup/main.go
  a: user::rw,u:65534:rwx,group::rw,mask::rwx,other::r
  x: 3 user.asd
  x: 3 user.foo

The only difference here is that the user gets saved with its ID instead of the actual name provided by the system.

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch from 091687b to 1fad91c Mar 8, 2019

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch 4 times, most recently from 4ae3946 to d2a4c5f Mar 18, 2019

@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Mar 18, 2019

Sooo I finally found the time to finish this. I finished the implementation of the ACL parsing and also added docs and the changelog for the feature.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Mar 18, 2019

Thank you very much! I'll try to find time to review and merge this the next couple of days!

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch from d2a4c5f to 57c47a6 Mar 19, 2019

@fd0

fd0 approved these changes Apr 13, 2019

@fd0
Copy link
Member

fd0 left a comment

I had a look, it looks great! I have two remarks:

  • I found it very odd that the tar output seems to drop the top-level directory name:
$ mkdir foo/bar/baz
$ echo test > foo/bar/baz/test

$ ./restic backup foo
[...]

$ ./restic ls latest
/foo
/foo/bar
/foo/bar/baz
/foo/bar/baz/test

$ ./restic dump latest /foo/bar/baz/test
test

$ ./restic dump latest / | tar tv
drwxr-xr-x 1000/100          0 2019-04-13 14:38 /bar
drwxr-xr-x 1000/100          0 2019-04-13 14:38 /bar/baz
-rw-r--r-- 1000/100          5 2019-04-13 14:38 /bar/baz/test

$ ./restic dump latest /foo | tar tv
drwxr-xr-x 1000/100          0 2019-04-13 14:38 /bar
drwxr-xr-x 1000/100          0 2019-04-13 14:38 /bar/baz
-rw-r--r-- 1000/100          5 2019-04-13 14:38 /bar/baz/test
  • I'd like to suggest adding a check that if stdout is a terminal, we return an error instead of throwing the (usually binary) tar output at the user. There's already stdoutIsTerminal() in cmd/restic that you could use. What do you think?

The rest looks great, I'll merge this after we agree on the two points above. Thank you very much for your work!

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Apr 13, 2019

FYI, I've pushed a commit which slightly rewords the changelog entry and tests that the Go version is at least 1.10 in build.go.

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch 2 times, most recently from 5eed80f to aa145d3 Apr 15, 2019

@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Apr 15, 2019

Thanks for the feedback!

  • I had a look at the issue that the first node doesn't get tarred. That definitely a bug, that I missed during my tests. It was because walker.Walk actually starts with the next node, so the first one was completely ignored. I fixed that now. :) EDIT: I did a bit more testing on this. The top folder is now included when dumping / or /foo but if dumping a folder two levels down like /foo/bar will still ignore /foo and start at /bar. This needs a bit more investigation.

  • I like that idea and put the check at the top of tarTree. It now throws an error that the output should be redirected.

I rebased everything into one clean commit.

@fd0 fd0 referenced this pull request Apr 18, 2019

Closed

Import/export of snapshots #1910

Restore whole folder to sdtout as tar
With this change it is possible to dump a folder to stdout as a tar. The

It can be used just like the normal dump command:

`./restic dump fa97e6e1 "/data/test/" > test.tar`

Where `/data/test/` is a a folder instead of a file.

@Kidswiss Kidswiss force-pushed the Kidswiss:tar branch from aa145d3 to 76aee0a Apr 18, 2019

@Kidswiss

This comment has been minimized.

Copy link
Author

Kidswiss commented Apr 18, 2019

I THINK I now have all cases handled for this. printFromTree now accepts pathToPrint so the information is actually available where I need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.