-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xz support to shutil #49661
Comments
Here's a patch that adds support for xz compression: |
Good idea ! are you able provide a unit test for the changes made into the two |
hmm, I'm unsure about how this should be done.. I guess such a test would belong in Lib/distutils/test_dist.py, but I'm |
no, rather in test_bdist_rpm, test_sdist and so on, |
distutils2 uses tarfile now instead of external programs. Adding dependency on another bug. |
FTR, the distutils2 repo is http://bitbucket.org/tarek/distutils2. distutils2.tests.support contains helper to create temp directories and run commands; see docstrings and example uses in the tests for more info. |
distutils2/packaging now just uses the shutil functions. I’ll make a patch for shutil after tarfile is updated. |
This not-so-bad patch adds lzma compression support to the shutil functions, under the 'xztar' name. Please review. |
I should add that a doc update will be part of the next patch. In parallel, I’ve also started updating packaging to add lzma support to the bdist command, but the situation is very disappointing: tarfile knows what format it supports, shutil has its own list, and packaging has a few duplicates too! Ideally, tarfile should have an attribute to expose what compression formats are available, then shutil would reuse that, and packaging would just call shutil. (I’m not even talking about the duplication in the doc.) If you agree with the general idea, I’ll open reports to a) add the attribute to tarfile b) rework the (luckily internal) registries in shutil to stop being hard-coded c) remove packaging’s registries. |
Functionally, the patch looks good to me. There are some docstrings that
+1 |
a) is bpo-14013; b) is bpo-14011. If Lars agrees to bpo-14013, I will withdraw this patch in favor of another one that would make shutil automatically support all compressors that tarfile supports (my b) point). Note that this is not going to be pain-free, as for example bzip2 compression is obtained with the strings “bz2”, “bzip2”, “bzip” or “bztar” depending on the modules; I think it’s worth it nonetheless, even if I have to add an ugly conversion in shutil for compat. |
I have a working updated shutil module, tests pass and the documentation is improved. I will make sure to make different commits for improving the tests, cleaning up some things, adding tarfile.compression_formats and removing duplication in shutil (see dep bugs), to be sure not to introduce regressions. Before I finish and post the patch, I’d like feedback on a choice I made. I don’t think it’s possible to have shutil automatically support all the formats that tarfile does, because of the spelling issue I mentioned. Here’s what the code would look like (let me know if I should post it elsewhere or as a file to let you get syntax coloring): _ARCHIVE_FORMATS = {} for fmt in tarfile.compression_formats:
code = fmt + 'tar'
ext = '.' + fmt
desc = "%s'ed tar file" % fmt
_ARCHIVE_FORMATS[code] = (_make_tarball, [('compress', fmt)], desc)
_UNPACK_FORMATS[code] = ([ext], _unpack_tarfile, [], desc)
# kludge to add alternate extension
if 'gztar' in _ARCHIVE_FORMATS:
_UNPACK_FORMATS['gztar'][0].append('.tgz')
# XXX desc should say "gzip'ed tar file", not "gz'ed"
# rectify naming incompatibility
if 'bz2tar' in _ARCHIVE_FORMATS:
# XXX alternative: make 'bztar' alias for 'bz2tar', but would complicate
# manipulating the registries
del _ARCHIVE_FORMATS['bz2tar']
del _UNPACK_FORMATS['bz2tar']
desc = "bzip2'ed tar file"
_ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bz2')], desc)
_UNPACK_FORMATS['bztar'] = (['.bz2'], _unpack_tarfile, [], desc) # now add uncompressed tar and zip file I really don’t like that code. Given that tarfile is not extensible at run time, it is not a big deal to have to update shutil whenever we add a compression format to tarfile. Therefore, I backtracked on my “automatic support” idea but kept a lot of cleanup in did in the code. Here’s what the code looks like: _ARCHIVE_FORMATS = {}
_UNPACK_FORMATS = {}
if 'xz' in tarfile.compression_formats:
desc = "xz'ed tar file"
# XXX '.xz' is not great, '.tar.xz' would be better
_ARCHIVE_FORMATS['xztar'] = (_make_tarball, [('compress', 'xz')], desc)
_UNPACK_FORMATS['xztar'] = (['.xz'], _unpack_tarfile, [], desc)
if 'bz2' in tarfile.compression_formats:
desc = "bzip2'ed tar file"
_ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bz2')], desc)
_UNPACK_FORMATS['bztar'] = (['.bz2'], _unpack_tarfile, [], desc)
if 'gz' in tarfile.compression_formats:
desc = "gzip'ed tar file"
_ARCHIVE_FORMATS['gztar'] = (_make_tarball, [('compress', 'gz')], desc)
_UNPACK_FORMATS['gztar'] = (['.gz', '.tgz'], _unpack_tarfile, [], desc) So, do you agree that “not automated but not ugly” is better than “automated with ugly klutches”? |
s/cleanup in did in the code/cleanup I did in the code/ |
Note that there is a way to get fully automated support for tar formats: tarfile could expose, in addition to the list compression_formats, another structure with the descriptions (e.g. “gzip’ed tar file”) and file extensions (e.g. ['.gz', '.tgz'] —no, it’s not '.tar.gz', which is unfortunate, and could cause Lars to reject that idea). I’m just putting this here for reference, but my preference is still for the second idea I talk about in my precedent message. |
Definitely. If we have to add special cases that are almost as long as
As much as it would be nice to have all of the information centralized in I think we can restructure the code a bit to reduce the work involved in _ARCHIVE_FORMATS = {}
_UNPACK_FORMATS = {}
By the way, is there any reason why you've used ".gz" instead of Also, I notice that _make_tarball() will need to be updated to add the If we did this, there would be no need for a separate "progname" field
|
For the "xztar" format, you should also perhaps recognize the ".txz" |
|
I have not seen this issue and was created a new bpo-16313 with almost same patch as Eric's one (only I have changed the documentation too). Here's the patch. I wonder that it was not committed yet to 3.3. I think it would be better first to add xz support and the code cleanup to do then (if it takes so much time). |
Oh, I see the '.bz2' bug. Patch updated. |
Éric, what’s your take on this approach (not code)? We have time enough till 3.4 but it seems this doesn't really move forward. Any thoughts how to get this moving? Unfortunately I'm not invested enough in this to make a educated decision. |
I will upload my patch and compare it with Serhiy’s. Now that 3.3.0 is released, there is no hurry. |
3.4 beta 1 will be soon. |
Serhiy's patch needs a "versionchanged" or "versionadded" tag in the Docs. |
Done. |
Now 3.4 is released, but shutil still doesn't support the xz compression. I think we should hurry on to be in time for 3.5. |
Your patch looks ok to me. Just update the version number in the docs ;) |
Éric? |
I’m afraid I changed computers once or twice since I worked on that, so I can’t readily find my clone and get the latest patch. I know where the hard drives are but I can’t say when I will have time to search them. |
New changeset e57db221b6c4 by Serhiy Storchaka in branch 'default': |
Thank you Serhiy! You need to fix the version number in the "versionchanged". |
New changeset a47996c10579 by Serhiy Storchaka in branch 'default': |
Oh, my fault. Thank you Antoine! |
sphinx generates warning for the current docs introduced by this issue: WARNING: Explicit markup ends without a blank line; unexpected unindent. I've uploaded a documentation patch that fixes it. |
Fixed. Thanks for the patch, Akira. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: