Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile extractall needlessly re-wraps ZipInfo instances #76923

Open
peterbe mannequin opened this issue Feb 1, 2018 · 3 comments
Open

zipfile extractall needlessly re-wraps ZipInfo instances #76923

peterbe mannequin opened this issue Feb 1, 2018 · 3 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@peterbe
Copy link
Mannequin

peterbe mannequin commented Feb 1, 2018

BPO 32742
Nosy @serhiy-storchaka, @peterbe
PRs
  • bpo-32742: use infolist for zipfile.ZipFile.extractall #5472
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-02-01.15:10:53.067>
    labels = ['3.8', 'type-feature', 'library']
    title = 'zipfile extractall needlessly re-wraps ZipInfo instances'
    updated_at = <Date 2018-03-27.16:53:40.745>
    user = 'https://github.com/peterbe'

    bugs.python.org fields:

    activity = <Date 2018-03-27.16:53:40.745>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2018-02-01.15:10:53.067>
    creator = 'peterbe'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32742
    keywords = ['patch']
    message_count = 3.0
    messages = ['311434', '311435', '314539']
    nosy_count = 2.0
    nosy_names = ['serhiy.storchaka', 'peterbe']
    pr_nums = ['5472']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue32742'
    versions = ['Python 3.8']

    @peterbe
    Copy link
    Mannequin Author

    peterbe mannequin commented Feb 1, 2018

    The ZipFile class as a extractall method [0] that allows you to leave the 'members' empty. If empty, the 'members' becomes a list of all the names of files in the zip. Then it iterates over the names as sends each to self._extract_member. But that method needs it to be a ZipInfo object instead of a file name, so it re-wraps it [2].

    Instead we can use self.infolist() to avoid that re-wrapping inside each self._extract_member call.

    [0] hhttps://github.com/python/cpython/blob/12e7cd8a51956a5ce373aac692ae6366c5f86584/Lib/zipfile.py#L1579
    [1]

    members = self.namelist()

    [2]

    cpython/Lib/zipfile.py

    Lines 1615 to 1616 in 12e7cd8

    if not isinstance(member, ZipInfo):
    member = self.getinfo(member)

    @peterbe peterbe mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 1, 2018
    @peterbe
    Copy link
    Mannequin Author

    peterbe mannequin commented Feb 1, 2018

    (PS. I'm new to filing Python bugs and submitting patches. I *think* this is the right version. I've only been looking at 'master'.)

    @peterbe peterbe mannequin added the 3.8 only security fixes label Feb 1, 2018
    @serhiy-storchaka
    Copy link
    Member

    There is an obscure behavior change introduced by this PR. Technically a ZIP file can contain several entries with the same name. Currently extractall() will extract the last of them multiple times. The result doesn't differ from when extract it only once, there is just a waste of time. After merging this PR extractall() will extract different entries to the same location. If one of the is a directory, and other is a file, extractall() will fail.

    I'm not sure it can be considered a valid ZIP file, but currently the zipfile module supports it.

    On other hand, what is the benefit of using self.infolist() instead of self.namelist()?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests

    1 participant