Skip to content

bpo-20907: shutil._unpack_zipfile add warnings for skipped files #29910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,10 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
registered for that extension. In case none is found,
a :exc:`ValueError` is raised.

Note that with *zip* format, absolute paths and paths containing a ``..``
component, are not extracted. If you need such paths extracted, consider
using :func:`ZipFile.extractall`.

.. audit-event:: shutil.unpack_archive filename,extract_dir,format shutil.unpack_archive

.. warning::
Expand Down
8 changes: 8 additions & 0 deletions Lib/shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import fnmatch
import collections
import errno
import logging

try:
import zlib
Expand Down Expand Up @@ -1212,12 +1213,14 @@ def _unpack_zipfile(filename, extract_dir):
raise ReadError("%s is not a zip file" % filename)

zip = zipfile.ZipFile(filename)
skipped = 0
try:
for info in zip.infolist():
name = info.filename

# don't extract absolute paths or ones with .. in them
if name.startswith('/') or '..' in name:
skipped += 1
continue

targetpath = os.path.join(extract_dir, *name.split('/'))
Expand All @@ -1231,6 +1234,11 @@ def _unpack_zipfile(filename, extract_dir):
open(targetpath, 'wb') as target:
copyfileobj(source, target)
finally:
if skipped:
import logging
logging.getLogger(__file__)
logging.warning(f'unpack {filename}: {skipped} file(s) skipped'
' (due to absolute path or `..` path component)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warnings are a mecanism for a module author to communicate about bad usage to other developers calling the code, not to communicate with end-users: https://docs.python.org/3/howto/logging.html#when-to-use-logging

Copy link
Contributor Author

@akulakov akulakov Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using logging.warning() here should be appropriate?

Edit: I'm not really sure between logging.info, warnings.warn and logging.warning. This function can be used in both libraries and end user apps.

  • it's not necessarily true (but can be) that client app needs to be modified -- for example if it needs to always extract all files, it can be changed to use ZipFile module
  • it's also not true that client app cannot do anything about it -- it can potentially use ZipFIle module or 3rd party module or custom logic.
  • logging.info seems not specific enough because we're warning somebody (user or developer) that files are skipped.

This might be an edge case that falls between warnings.warn and logging.warning? If so, I don't mind using one that you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked for a similar usages in the library and found this:

if len(comment) > ZIP_MAX_COMMENT:

it's very similar in that both cases warn about something being lost when creating / unpacking a zip file: some contained files and a part of a comment respectively. That warning was added in 2014.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! @serhiy-storchaka added that change, maybe he can comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just replaced existing prints with warnings. I am not sure that it is a good solution, but it at least gives the user some control: warnings can be printed, silenced or converted to exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found another example where warning would be for the end user rather than app author: https://github.com/pganssle/zoneinfo/blob/c54d700fcba78b20733390ee3014809ad69858f6/src/backports/zoneinfo/_tzpath.py#L58

-- this one is from 2020 by Paul Ganssle

However I think for this use case, I don't see any issue with using logging.warning or logging.error. But I wonder which one of those would be more appropriate. It fits the description of logging.error more closely but at the same time treating it as an "error" might be too strong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general logging.warning would be cleaner here than using the warnings module. As Éric says, this is a data-driven problem that is of interest to the end-user, less so to the developer of the application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akulakov you could in fact ask on python-dev about the general practice if you want feedback; but I don't think this is particularly controversial: libraries can log things to end-users, and many do. I'd just use the logging module here, and as a follow-up task, fix the other warnings call you found in zipfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambv: I agree; I was going to update it but got a little distracted -- will push an update today or tomorrow.. Thanks for looking at this as well!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few functions in shutil.py accept logger as an arg and use it to print info/debug logs. I think in this case there's no reason to follow this pattern because we want the warnings to be displayed by default.

zip.close()

def _unpack_tarfile(filename, extract_dir):
Expand Down
8 changes: 8 additions & 0 deletions Lib/test/test_shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -1686,6 +1686,14 @@ def check_unpack_archive_with_converter(self, format, converter):
self.assertRaises(shutil.ReadError, unpack_archive, converter(TESTFN))
self.assertRaises(ValueError, unpack_archive, converter(TESTFN), format='xxx')

def test_unpack_archive_zip_warn_skipped(self):
tmpdir2 = self.mkdtemp()
with self.assertLogs(level='WARNING') as cm:
fn = support.findfile("testzip.zip")
unpack_archive(pathlib.Path(fn), pathlib.Path(tmpdir2))
self.assertIn('1 file(s) skipped', cm.output[0])
self.assertEqual(rlistdir(tmpdir2), ['test'])

def test_unpack_archive_tar(self):
self.check_unpack_archive('tar')

Expand Down
Binary file added Lib/test/testzip.zip
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Added warning for skipped files in :func:`shutil.unpack_archive` using *zip*
format.