You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background nested zip is zip inside a zip. For example we can have file in nested.zip in outside.zip as follow:
outside.zip
| - nested.zip
| - | - file
The following code is an example of how ChainerIO actually accesses such nested zipfile.
with zipfile.ZipFile("outside.zip", "rb") as outside_zip:
with outside_zip.open("nested.zip", "rb") as inside_f:
with zipfile.ZipFile(inside_f) as inside_zip:
with inside_f.open("file", "rb") as f:
f.read()
Both file_path and file object can be passed to create a zipfile.ZipFile.
Upon creating the zipfile.ZipFile, the zipfile module reads the header of the file to check whether the given file is a zip file. When checking, a "seek" is performed on the given file/file object, which requires the given file/file object to be seekable.
Problem
In Python<3.7, the ZipExtFile, which is the returned file object by the zipfile.open, i.e. the inside_f in the above code, is not seekable in any case. Which leads to failure on creation of ZipFile of the nested zip.
**Background**
`nested zip` is zip inside a zip. For example we can have `file` in `nested.zip` in `outside.zip` as follow:
```
outside.zip
| - nested.zip
| - | - file
```
The following code is an example of how ChainerIO actually accesses such nested zipfile.
```
with zipfile.ZipFile("outside.zip", "rb") as outside_zip:
with outside_zip.open("nested.zip", "rb") as inside_f:
with zipfile.ZipFile(inside_f) as inside_zip:
with inside_f.open("file", "rb") as f:
f.read()
```
Both `file_path` and `file object` can be passed to create a `zipfile.ZipFile`.
Upon creating the `zipfile.ZipFile`, the zipfile module reads the header of the file to check whether the given file is a zip file. When checking, a "[seek](https://github.com/python/cpython/blob/13a19139b5e76175bc95294d54afc9425e4f36c9/Lib/zipfile.py#L271)" is performed on the given file/file object, which requires the given file/file object to be `seekable`.
**Problem**
In Python<3.7, the [ZipExtFile](https://github.com/python/cpython/blob/13a19139b5e76175bc95294d54afc9425e4f36c9/Lib/zipfile.py#L760), which is the returned file object by the zipfile.open, i.e. the `inside_f` in the above code, is not `seekable` in any case. Which leads to failure on creation of `ZipFile` of the nested zip.
The problem was fixed in Python 3.7 as [code](https://github.com/python/cpython/blob/ed44b84961eb0e5b97e4866c1455ac4093d27549/Lib/zipfile.py#L819)
This commit adds nested zip support for Python< 3.7.
Unlike the Python > 3.7, due to implementation issues in Python,
the nested zip support is not enabled by default.
A workaround is added to support that, which possible performance and
memory consumption issues.
A warning message is generated in that case.
For more details, please refer to the comments in
`containers/zip.py`.
#41
Background
nested zip
is zip inside a zip. For example we can havefile
innested.zip
inoutside.zip
as follow:The following code is an example of how ChainerIO actually accesses such nested zipfile.
Both
file_path
andfile object
can be passed to create azipfile.ZipFile
.Upon creating the
zipfile.ZipFile
, the zipfile module reads the header of the file to check whether the given file is a zip file. When checking, a "seek" is performed on the given file/file object, which requires the given file/file object to beseekable
.Problem
In Python<3.7, the ZipExtFile, which is the returned file object by the zipfile.open, i.e. the
inside_f
in the above code, is notseekable
in any case. Which leads to failure on creation ofZipFile
of the nested zip.The problem was fixed in Python 3.7 as code
The text was updated successfully, but these errors were encountered: