Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Zip not supported in Python < 3.7 #41

Closed
belldandyxtq opened this issue Aug 16, 2019 · 0 comments
Closed

Nested Zip not supported in Python < 3.7 #41

belldandyxtq opened this issue Aug 16, 2019 · 0 comments
Labels
cat:bug Bug report or fix.

Comments

@belldandyxtq
Copy link
Member

belldandyxtq commented Aug 16, 2019

Background
nested zip is zip inside a zip. For example we can have file in nested.zip in outside.zip as follow:

outside.zip
| - nested.zip
| - | - file

The following code is an example of how ChainerIO actually accesses such nested zipfile.

with zipfile.ZipFile("outside.zip", "rb") as outside_zip:
       with outside_zip.open("nested.zip", "rb") as inside_f:
              with zipfile.ZipFile(inside_f) as inside_zip:
                     with inside_f.open("file", "rb") as f:
                              f.read()

Both file_path and file object can be passed to create a zipfile.ZipFile.
Upon creating the zipfile.ZipFile, the zipfile module reads the header of the file to check whether the given file is a zip file. When checking, a "seek" is performed on the given file/file object, which requires the given file/file object to be seekable.

Problem

In Python<3.7, the ZipExtFile, which is the returned file object by the zipfile.open, i.e. the inside_f in the above code, is not seekable in any case. Which leads to failure on creation of ZipFile of the nested zip.

The problem was fixed in Python 3.7 as code

@belldandyxtq belldandyxtq added the cat:bug Bug report or fix. label Aug 16, 2019
kuenishi pushed a commit that referenced this issue Sep 17, 2019
**Background**
`nested zip` is zip inside a zip. For example we can have `file` in `nested.zip` in `outside.zip` as follow:
```
outside.zip
| - nested.zip
| - | - file
```

The following code is an example of how ChainerIO actually accesses such nested zipfile.
```
with zipfile.ZipFile("outside.zip", "rb") as outside_zip:
       with outside_zip.open("nested.zip", "rb") as inside_f:
              with zipfile.ZipFile(inside_f) as inside_zip:
                     with inside_f.open("file", "rb") as f:
                              f.read()
```
Both `file_path` and `file object` can be passed to create a `zipfile.ZipFile`.
Upon creating the `zipfile.ZipFile`, the zipfile module reads the header of the file to check whether the given file is a zip file. When checking, a "[seek](https://github.com/python/cpython/blob/13a19139b5e76175bc95294d54afc9425e4f36c9/Lib/zipfile.py#L271)" is performed on the given file/file object, which requires the given file/file object to be `seekable`.

**Problem**

In Python<3.7, the [ZipExtFile](https://github.com/python/cpython/blob/13a19139b5e76175bc95294d54afc9425e4f36c9/Lib/zipfile.py#L760), which is the returned file object by the zipfile.open, i.e. the `inside_f` in the above code, is not `seekable` in any case. Which leads to failure on creation of `ZipFile` of the nested zip.

The problem was fixed in Python 3.7 as [code](https://github.com/python/cpython/blob/ed44b84961eb0e5b97e4866c1455ac4093d27549/Lib/zipfile.py#L819)

This commit adds nested zip support for Python< 3.7.
Unlike the Python > 3.7, due to implementation issues in Python,
the nested zip support is not enabled by default.
A workaround is added to support that, which possible performance and
memory consumption issues.
A warning message is generated in that case.
For more details, please refer to the comments in
`containers/zip.py`.
#41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:bug Bug report or fix.
Projects
None yet
Development

No branches or pull requests

1 participant