Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtree "content based" outputs are surprising when reading tar file from standard in? #786

Open
hartzell opened this issue Sep 23, 2016 · 6 comments

Comments

@hartzell
Copy link

When I output an mtree file, fields that depend on the actual contents of the file don't to the right thing:

cat moose.tar | bsdtar -cf - --format=mtree --options '!all,cksum,sha1' @-

The resulting sha1digest is the result you'd get for an empty file:

./efs/transfer/unpack-outputs/unpack.sh.o96 cksum=4294967295 sha1digest=da39a3ee5e6b4b0d3255bfef95601890afd80709

The actual bytes are in the tar file, is there some way to use them to produce the mtree output? Would it be a hard feature to add (and/or a welcome pull request?)?

My goal is to have an mtree file that accurately represents the contents of a tar file. My current scheme is to generate an mtree file, generate a tar file, unpack the tar file, generate a new mtree file from the original input specification and diff the new and old files. Alternatively, I can unpack the archive and use the mtree command, but I'm working on CentOS and I'm unsure about ongoing support for mtree-port.

A related question: is there any way to generate multiple outputs (e.g. mtree and tar) simultaneously?

@kientzle
Copy link
Contributor

I would welcome a pull request that fixed the mtree writer to correctly emit checksums in such cases. I don't think it would be particularly hard to do.

It is not possible to generate multiple outputs simultaneously with bsdtar. You could use libarchive to write a program that did this: Just open two archive handles and feed the same data to both. You can use the minitar example program as a starting point if you like.

@hartzell
Copy link
Author

@kientzle -- w.r.t. writing a program that generates multiple outputs, you were right that starting from minitar made it pretty straightforward.

I'm missing (and I think that minitar is also missing) a bit of magic to get uname and gname to work in my outputs.

I noted that, in spite of specifying them in options (and after checking that I was generally able to specify mtree options) that my mtree output never had uname or gname values. I think that I see (libarchive/archive_write_set_format_mtree.c:968) that if a gname isn't available, no gname= string will be written to the mtree output, even if it's been asked for, and that no error is generated.

I wasn't sure why there wasn't any gname info available in my version, then noticed that it wasn't in the ustar archive I'm writing either.

Then I checked and I don't believe it's in the one that minitar generates.

If I tar up a small directory using the system's tar command (OS X 10.10.5, either tar or bsdtar) and run strings on the result, I can see a user and group name.

If I do the same experiment with minitar and (or my program) it does not seem to have the names.

hartzelg-UXG8WL:minitar hartzelg$ tar -c -f /tmp/foo.tar ape
hartzelg-UXG8WL:minitar hartzelg$ strings /tmp/foo.tar
ape/
000755
000765
000024
00000000000 12772777230 012624
ustar
00hartzelg
staff
000000
000000
ape/foo
000644
000765
000024
00000000006 12772777230 013326
ustar
00hartzelg
staff
000000
000000
moose
hartzelg-UXG8WL:minitar hartzelg$ ./minitar -c -f /tmp/foo.tar ape
hartzelg-UXG8WL:minitar hartzelg$ strings /tmp/foo.tar
ape/
000755
000765
000024
00000000000 12772777230 010037
ustar
000000
000000
ape/foo
000644
000765
000024
00000000006 12772777230 010541
ustar
000000
000000
moose
hartzelg-UXG8WL:minitar hartzelg$

I think that there must be a bit of secret sauce that I'm missing, but walking libarchive's tar code hasn't helped me. I have a feeling that it might involve either archive_read_disk_set_standard_lookup (which minitar and I call on the things we're reading) or archive_write_disk_set_standard_lookup (which only seems relevant on extraction).

Can you toss me a hint?

@hartzell
Copy link
Author

I can make minitar Do The Right Thing(TM) [and thence also my code] with this diff:

diff --git a/examples/minitar/minitar.c b/examples/minitar/minitar.c
index 81e5e11..e554f77 100644
--- a/examples/minitar/minitar.c
+++ b/examples/minitar/minitar.c
@@ -267,6 +267,7 @@ create(const char *filename, int compress, const char **argv)
                        errmsg("\n");
                        exit(1);
                }
+               archive_read_disk_set_standard_lookup(disk);

                for (;;) {
                        int needcr = 0;

Is that sensible or have I just plastered over the problem?

@hartzell
Copy link
Author

This might be a more reasonable change. There doesn't seem to be any point to the earlier call to ...set_standard_lookup and my addition should be ifdef'ed to follow the existing pattern.

diff --git a/examples/minitar/minitar.c b/examples/minitar/minitar.c
index 81e5e11..755773a 100644
--- a/examples/minitar/minitar.c
+++ b/examples/minitar/minitar.c
@@ -254,9 +254,6 @@ create(const char *filename, int compress, const char **argv)
        archive_write_open_filename(a, filename);

        disk = archive_read_disk_new();
-#ifndef NO_LOOKUP
-       archive_read_disk_set_standard_lookup(disk);
-#endif
        while (*argv != NULL) {
                struct archive *disk = archive_read_disk_new();
                int r;
@@ -267,6 +264,9 @@ create(const char *filename, int compress, const char **argv)
                        errmsg("\n");
                        exit(1);
                }
+#ifndef NO_LOOKUP
+               archive_read_disk_set_standard_lookup(disk);
+#endif

                for (;;) {
                        int needcr = 0;

@kientzle
Copy link
Contributor

Yes, you do need to set the uname and gname fields in the archive_entry object if you want uname and gname in the output archive. Because uname and gname lookups can be expensive and sometimes need to be handled via unusual mechanisms, the archive_read_disk engine does not do any uname or gname lookup by default. The libarchive library does provide an optional "standard" uname and gname lookup machinery that uses common POSIX functions and works well for common applications.

I'd appreciate a Pull Request with your suggested minitar change. A one-line comment explaining that this enables automatic uname and gname lookups would be nice as well.

@hartzell
Copy link
Author

I filed PR #791 that addresses this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants