Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executable bit in ZIP-archives get thrown away when reading from stdin. #1106

Closed
felixbecker2 opened this issue Dec 11, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@felixbecker2
Copy link

commented Dec 11, 2018

I encountered that the command bsdtar from the package libarchive (under Arch Linux, at least) does throw away executable bits of files in .zip-archives when reading from stdin, but not when directly working on the file.

On .tar-archives it preserves the executable bit also when reading from stdin.

bsdtar --version: bsdtar 3.3.3 - libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.2 libzstd/1.3.5.

Test case:

I made a test archive http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip which contains two files in a subdirectory. One file is executable, the other not.

Extracting directly:

wget -q -O archive_exevutable_bit_test.zip http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip
bsdtar -x -f archive_exevutable_bit_test.zip
ls -nl archive_exevutable_bit_test/*

shows

-rwxr-xr-x 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

The executable bit for executable.sh is present here.

Reading from stdin:

wget -q -O - http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip | bsdtar -x -f -
ls -nl archive_exevutable_bit_test/*

shows

-rw-r--r-- 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

The executable bit for executable.sh is thrown away here.

.tar-archive:

As a comparison, for a .tar-archive, the executable bit in the archive is also honoured works also when reading from stdin:

wget -q -O - http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.tar | bsdtar -x -f -
ls -nl archive_exevutable_bit_test/*

shows

-rwxr-xr-x 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

Expected behavious:

  • Permission handling should not depend on the source from which the archive is read.
  • Permission handling inconsistencies should not depend on the type of the archive.
@jsonn

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2018

Zip archives contains two different ways to describe the content:
(1) A per-entry header
(2) A central directory at the end of the zip file.
libarchive (and bsdtar by extension) will use the central directory if seeking is possible on the input, otherwise it will fall back to the streaming-only logic. The entries are not necessarily consistent as you found out in your test case. There isn't really much we can or want to do about this. Note that you can replace wget with a plain cat and it will still show the same behavior.

The short version is that this is an inherent issue with streaming of zip files and something that won't be fixed.

@jsonn jsonn closed this Dec 11, 2018

@felixbecker2

This comment has been minimized.

Copy link
Author

commented Dec 11, 2018

According to http://unix.stackexchange.com/questions/487338#487371, this also happens if bsdtar itself created the ZIP archive. Shouldn't at least libarchive then create consistent meta-information (per-entry header and central directory having consistent information), so that archives created by libarchive are extracted correctly by libarchive? Maybe this then a bug in libarchive, that it creates ZIP archives with inconsistent information?

Is there any standard to ZIP which information (per-entry header or central directory) is more to trust?

@jsonn

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2018

bsdtar doesn't create the extension by default, it can be requested with --options zip:experimental.

@jsonn

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

Because ISO files are not streamable in most situations in a meaningful way. File attributes on the other hand are often enough absend in zip files.

@kientzle

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

As Joerg pointed out, there are basic limitations with some of the formats we deal with:

  • Tar files are always read in a streaming fashion, so always work the same way. If you need to work with streaming archives a lot, tar format is a good choice.
  • Zip files store file metadata in two different ways: partial metadata is stored with each entry; full metadata is stored at the end of the archive. Libarchive's Zip reader will seek to obtain full metadata if it can; otherwise it will use the partial metadata.
  • ISO allows file attributes to be stored before or after the entry. Libarchive's ISO reader will seek to obtain out-of-order metadata if it can; otherwise it will fail.

As a workaround, libarchive's Zip support includes an experimental extension (developed in conjunction with the Info-Zip maintainers) that puts more complete metadata with each entry. I hope to enable this by default at some point.

In theory, the streaming Zip reader could read the full metadata when it does get to the end and update all the files. This would require some careful rework of the Zip reader and probably changes to the logic that writes files to disk. In essence, every file would get "written to disk" twice: Once with full data and partial metadata, again with full metadata and no data.

@kientzle

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

Is there any standard to ZIP which information (per-entry header or central directory) is more to trust?

The Zip standard is here:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

If you study this carefully, you'll notice that the file permissions are only stored in the central directory. All other metadata should be the same. The zip:experimental adds an extension to the per-entry header which duplicates the file permissions that are present in the central directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.