-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default tar format to modern POSIX 2001 (pax) for better portability/interop, support and standards conformance #80449
Comments
I propose changing tarfile.DEFAULT_FORMAT to be tarfile.PAX_FORMAT , rather than the legacy tarfile.GNU_FORMAT for Python 3.8. This would offer several benefits: • Removes limitations of the old GNU tar format, including in max UID/GID values and bits in device major and minor numbers, and is the most flexible and feature-rich tar format currently This change would have no effect on reading existing archives, only writing new ones, and should be broadly compatible with any remotely modern system, as pax support is included in all the widely used libraries/systems:
Furthermore, essentially every existing archiver supports ustar format, which would allow interoperability on very old/exotic platforms that don't support pax for some reason (and would certainly not support GNU). Therefore, it should be more than safe to make the change now, with archivers on the three major platforms supporting the modern standard for nearly a decade, and any esoteric ones at least as likely to support the POSIX standard as the vendor-specific GNU extension. Is there any particular reason why we shouldn't make this change? Is there a particular group/list I should contact to follow up about seeing this implemented? It seems it should only require a one-line change here, aside from updating the docs and presumably the NEWS file, which I would be willing to do (I would think it should make a fairly straightforward first contribution). |
Looks reasonable. Do you know whether it is supported on OpenBSD and NetBSD? In other popular programming languages? |
In general, since pax is a backwards-compatible superset of the standard, portable ustar unlike the vendor-specific GNU format that even GNU tar itself no longer recommends in favor of switching to pax by default, it is to my understanding essentially always the better choice. The only exception would be systems that support GNU tar but not POSIX 2001 and where the limitations of the old ustar must be bypassed, which as far I'm aware is basically just really old (>10-15 years) GNU/Linux. NetBSD and OpenBSD both use bsdtar implementations, which as far as I could find means they support the POSIX 2001-standard pax format, and (unless they use libarchive which supports all three) likely *don't* support the current GNU format which is specific to GNU tar. Even if they don't, their ustar support means they can read pax archives as legacy ustar archives (as pax is backwards-compatible), while the same is not necessarily true of GNU tar archives. Therefore, pax is strictly a better choice than GNU or ustar. Most other programming languages I could find did not have internal/standard library implementations, instead relying on the aforementioned libraries or varying third party packages:
|
Do you mind to create a PR? |
Sure, in work now. Its my first contribution to CPython, so bear with me. I presume this is too trivial to go in the What's New in Python article, but does merit a NEWS entry so users are aware of the change? Aside from changing this line, updating the documentation to reflect the change, and possibly adding a NEWS entry, is there anything else that needs to be done? Thanks. |
PR is up with CI checks green as GH-12355. I also had to fix one test which implicitly assumed that DEFAULT_FORMAT == GNU_FORMAT. |
Also, one additional minor note (since I apparently can't edit comments here). Windows 10 (since the April 2018 update a year ago) now includes libarchive-based bsdtar built-in by default and accessible from the standard command prompt, which as mentioned fully supports pax. Therefore, all modern platforms should support extracting them out of the box (aside from Windows 7/Server 2008, for which extended support will end within two months from Python 3.8's initial release, Windows 10 pre-1803 for which enterprise support will end a few months after that, and Windows 8.1/Server 2012, which will be in extended support for a few more years but very low enterprise/developer/power user adoption; of course, these don't include any built-in tar support at all anyway). |
Thank you for your contribution! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: