Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipExtFile in zipfile can be seekable #67097

IridiumYang mannequin opened this issue Nov 20, 2014 · 7 comments

ZipExtFile in zipfile can be seekable #67097

IridiumYang mannequin opened this issue Nov 20, 2014 · 7 comments
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement


Copy link

IridiumYang mannequin commented Nov 20, 2014

BPO 22908
Nosy @gpshead, @serhiy-storchaka, @jjolly
  • bpo-22908: Add seek and tell functionality to ZipExtFile #4966
  • Files
  • zipfile.diff
  • zip-in-zip test program
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = ''
    closed_at = <Date 2018-01-30.08:53:26.935>
    created_at = <Date 2014-11-20.15:20:43.347>
    labels = ['3.7', 'type-feature', 'library']
    title = 'ZipExtFile in zipfile can be seekable'
    updated_at = <Date 2018-01-30.08:53:26.933>
    user = '' fields:

    activity = <Date 2018-01-30.08:53:26.933>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2018-01-30.08:53:26.935>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2014-11-20.15:20:43.347>
    creator = 'Iridium.Yang'
    dependencies = []
    files = ['37237', '47345']
    hgrepos = []
    issue_num = 22908
    keywords = ['patch']
    message_count = 7.0
    messages = ['231438', '231446', '231472', '256683', '268764', '308935', '311254']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'jae', 'serhiy.storchaka', 'Iridium.Yang', 'dkessel', 'jjolly']
    pr_nums = ['4966']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = ''
    versions = ['Python 3.7']

    Copy link
    Mannequin Author

    IridiumYang mannequin commented Nov 20, 2014

    The ZipExtFile class in zipfile module does not provide a seek method like GzipFile. As a result, it is hard to manipulate files without extract all the content.
    For example, a very large tar file compressed with zip. The TarFile module can operate on file object, but need seek method. So the ZipExtFile instance return from ZipFile can not passed into TarFile.
    This may seem strange but I encounter this on Samsung firmware.

    In fact, the seek method in GzipFile or someother compressed format can be implemented in zipfile very easily. Here is my naive modification (nearly same as in GzipFile)

    @IridiumYang IridiumYang mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Nov 20, 2014
    Copy link

    I'm -1 on adding the seek method with linear complexity. This looks as attractive nonsense to me. It would be better just make TarFile working with non-seekable streams.

    Copy link

    Actually TarFile already works with non-seekable streams. Use with mode='r|*' or like.

    On other hand I'm not against the make non-compressed ZipExtFile seekable. It can be helpful in case when ZIP file is used just as a container for other files.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 21, 2014
    Copy link

    dkessel mannequin commented Dec 18, 2015

    It would be great to have the ZipFileExt class seekable.
    This would help in implementing features in other projects.

    For example, pydicom would gain the ability to read from ZIP files, as mentioned in pydicom/pydicom#219

    Copy link

    jae mannequin commented Jun 18, 2016

    To add to this (without looking at the patch): I just to my astonishment learned that a ZipExtFile doesn't even support tell(). I can understand the seek being nontrivial... but the tell? It's a bytestream, and there is (isn't there?) a clear definition of what next byte a read(1) would deliver. It should be trivial to keep track of the (only ever increasing) file position.

    Copy link

    jjolly mannequin commented Dec 22, 2017

    Please be gentle, this is my first submission to python.

    The use case for me was a recursive zip-within-a-zip situation. I wanted to allow the creation of a zipfile.ZipFile from an existing zipfile.ZipExtFile, but the lack of seek prevented this.

    I simply treated forward seeks as a read, and backward seeks as a reset-and-read. The reset was the tricky part as it required restoring several original values such as the remaining compressed length, remaining data length, and the running crc32.

    I pushed this into the latest upstream branch, but as I am testing this in v3.4 it should be easy to backport if necessary (I suspect not).

    I based my fix on a little program that I wrote to test the feasibility of this idea. I am attaching that test program here.

    @gpshead gpshead assigned gpshead and unassigned serhiy-storchaka Jan 30, 2018
    Copy link

    gpshead commented Jan 30, 2018

    New changeset 066df4f by Gregory P. Smith (John Jolly) in branch 'master':
    bpo-22908: Add seek and tell functionality to ZipExtFile (GH-4966)

    @gpshead gpshead added the 3.7 (EOL) end of life label Jan 30, 2018
    @gpshead gpshead closed this as completed Jan 30, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    None yet

    No branches or pull requests

    2 participants