Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for streaming from ZIP archives broken since 6.0? #747

Closed
akutuzov opened this issue Dec 18, 2022 · 1 comment
Closed

Support for streaming from ZIP archives broken since 6.0? #747

akutuzov opened this issue Dec 18, 2022 · 1 comment

Comments

@akutuzov
Copy link

Problem description

Hi,
Versions of smart_open before 6.0.0 successfully open file identifiers passed from the zipfile module.

However, starting from 6.0.0 the same code yields
TypeError: don't know how to handle uri <zipfile.ZipExtFile name='FILENAME' mode='r' compress_type=deflate>
(where FILENAME is the actual name of the file inside the archive)

Looks like a regression to me. Is it somehow related to this PR?

Anyway, this is really inconvenient. Is there now any other way of reading files from ZIP archives directly, without decompressing on disk?

Many thanks for clarifying this.

Steps/code to reproduce the problem

Minimal reproducible example:

import zipfile
from smart_open import open
zf = zipfile.ZipFile("ANY_ZIP_FILE")
file2read = zf.open("FILENAME_INSIDE_ARCHIVE")
data = open(file2read, 'r')

This code works flawlessly on smart_open 5.2.1 but fails on smart_open 6.0.0 with:

Traceback (most recent call last):
  File "test_sm.py", line 13, in <module>
    data = open(file2read, 'r')
  File ".../lib/python3.9/site-packages/smart_open/smart_open_lib.py", line 224, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File ".../lib/python3.9/site-packages/smart_open/smart_open_lib.py", line 396, in _open_binary_stream
    raise TypeError("don't know how to handle uri %s" % repr(uri))
TypeError: don't know how to handle uri <zipfile.ZipExtFile name='FILENAME_INSIDE_ARCHIVE' mode='r' compress_type=deflate>

Versions

I observe this on at least two independent machines with different Python version:

Linux-5.4.0-131-generic-x86_64-with-glibc2.29
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0]
smart_open 6.0.0
Linux-3.10.0-1160.66.1.el7.x86_64-x86_64-with-glibc2.17
Python 3.9.5 (default, Jun 15 2021, 17:07:44) 
[GCC 10.3.0]
smart_open 6.0.0
@mpenkov
Copy link
Collaborator

mpenkov commented Dec 18, 2022

We dropped support for your specific use case because it is inconsistent with the built-in open function, which only handles string input for the first parameter.

Yes, you've correctly pointed to the PR that dropped the support.

You can still achieve what you want by following the instructions here.

@mpenkov mpenkov closed this as completed Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants