Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large GIDs/UIDs (over 256k) can't be represented with libarchive with USTAR format; caveat not noted #885

Closed
ngie-eign opened this issue Mar 15, 2017 · 3 comments

Comments

@ngie-eign
Copy link
Contributor

ngie-eign commented Mar 15, 2017

Isilon OneFS supports UIDs > 256k, in part for Windows compatibility. This unfortunately causes us some grief with using libarchive with tar as the USTAR format, as the limits are hardcoded here:

 65 #define USTAR_uid_offset 108
 66 #define USTAR_uid_size 6
 67 #define USTAR_uid_max_size 8
 68 #define USTAR_gid_offset 116
 69 #define USTAR_gid_size 6
 70 #define USTAR_gid_max_size 8

If the answer to my support request is "don't use raw USTAR; use ACLs/ACEs/extended attributes" or "use the pax format", that's ok -- it just might affect the course of some internal development.

It would be good (regardless) if this caveat was mentioned in the libarchive-formats manpage (along with other similar caveats), as it's not currently listed.

@jsonn
Copy link
Contributor

jsonn commented Mar 16, 2017

Yes, the answer is "don't use USTAR if you have large numeric fields". tar(5) documents that the size of the UID/GID field is 8 octets including terminating NUL and written with octal numbers. This means only <2^21 values are supported.

If you want the documentation to be more explicit, please submit patches.

@kientzle
Copy link
Contributor

The ustar format is defined by POSIX. Libarchive's implementation attempts to follow the standard as closely as possible which does, unfortunately, include limits to file size, UID, GID, and other numeric fields.

The pax format was designed by POSIX to be a successor to ustar that provides more complete support for modern systems, including eliminating most numeric limits. Libarchive's "restricted pax" mode was specifically designed to use pax extensions only as necessary; this produces archives that support large UIDs, GIDs, and file sizes while still providing a good level of interoperability with ustar implementations.

As @jsonn mentioned, if you have suggestions for improving any of our documentation, we would appreciate a pull request.

@kientzle
Copy link
Contributor

To repeat one point:

You should use libarchive's "restricted pax" format via archive_write_set_format_pax_restricted(). This was designed to balance ustar compatibility with support for expanded values. Tar archives created in this way will support large UIDs, GIDs, long file names, etc, while preserving (as far as possible) compatibility with existing ustar implementations.

In contrast, libarchive's archive_write_set_format_ustar() follows the POSIX standard very closely and rejects or truncates data that cannot be stored precisely according to that standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants