-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tarfile.PAX_FORMAT in shutil.make_archive #74846
Comments
shutil.make_archive currently just uses the default tar format, which is GNU_FORMAT. This format doesn't ensure that all character paths are encoded as UTF-8, and hence may end up embedding platform specific encoding assumptions into the generated tarball. I see a few possible ways of resolving this:
|
The main benefit I'd see to the last option is that it would also cover passing a "filter" option for tarfile.TarFile.add(). Dropping down to the lower level API for that isn't *hard*, it's just a bit fiddly (note: currently untested example code): sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
sdist.add(os.getcwd(), arcname=sdist_subdir, filter=_exclude_hidden_and_special_files) |
Aye, I agree that changing the default resolves the feature request here. I've recategorised this as a documentation issue, as the initial PR only changed the So the changes needed will be:
An addition to the "Porting" section in What's New may also be needed, depending on how tarfile.Tarfile behaves if you tell it to open a PAX_FORMAT archive using GNU_FORMAT or vice-versa (tarfile.open and shutil.unpack_archive will be fine, since they query the file's own metadata to find out which format to use) |
I opened a PR to implement both those changes, and also added some minor related clarifications and fixes to the format section of the tarfile docs.
I tested tarfile.Tarfile() and extract_all() on the resulting object with several different simple- to moderately-complex (including Unicode filenames) real-world pax- and GNU-format archives packed with different archivers, with both format=GNU_FORMAT and format=PAX_FORMAT for each one, got no warnings or errors with debug=3 and errorlevel=2, and extraction was successful and yielded identical results for either format argument, and did not get a PAXHEADERS file output for either one. Furthermore, tracing the code, its not clear that Tarfile() (with 'r') and extract, etc. use the passed Even if so, in order to produce an error after this change but not before, all of the following would seem to have to be the case:
Therefore, this seems like a very specific corner-case. However, if you think I should include it, I'll go ahead with it. Also, let me know if these doc changes should have a separate NEWS entry or the previous one adequately covers it. |
tarfile does not use the |
Thanks for the confirmation! |
Thanks for the technical clarification Lars, and for the docs update C.A.M. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: