-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
ZipExtFile in zipfile can be seekable #67097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The ZipExtFile class in zipfile module does not provide a seek method like GzipFile. As a result, it is hard to manipulate files without extract all the content. In fact, the seek method in GzipFile or someother compressed format can be implemented in zipfile very easily. Here is my naive modification (nearly same as in GzipFile) |
I'm -1 on adding the seek method with linear complexity. This looks as attractive nonsense to me. It would be better just make TarFile working with non-seekable streams. |
Actually TarFile already works with non-seekable streams. Use TarFile.open() with mode='r|*' or like. On other hand I'm not against the make non-compressed ZipExtFile seekable. It can be helpful in case when ZIP file is used just as a container for other files. |
It would be great to have the ZipFileExt class seekable. For example, pydicom would gain the ability to read from ZIP files, as mentioned in pydicom/pydicom#219 |
To add to this (without looking at the patch): I just to my astonishment learned that a ZipExtFile doesn't even support tell(). I can understand the seek being nontrivial... but the tell? It's a bytestream, and there is (isn't there?) a clear definition of what next byte a read(1) would deliver. It should be trivial to keep track of the (only ever increasing) file position. |
Please be gentle, this is my first submission to python. The use case for me was a recursive zip-within-a-zip situation. I wanted to allow the creation of a zipfile.ZipFile from an existing zipfile.ZipExtFile, but the lack of seek prevented this. I simply treated forward seeks as a read, and backward seeks as a reset-and-read. The reset was the tricky part as it required restoring several original values such as the remaining compressed length, remaining data length, and the running crc32. I pushed this into the latest upstream branch, but as I am testing this in v3.4 it should be easy to backport if necessary (I suspect not). I based my fix on a little program that I wrote to test the feasibility of this idea. I am attaching that test program here. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: