Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing File Object Not Filename To GzipFile and BZ2File #109

Closed
simonseed opened this issue Mar 16, 2017 · 4 comments
Closed

Passing File Object Not Filename To GzipFile and BZ2File #109

simonseed opened this issue Mar 16, 2017 · 4 comments
Labels

Comments

@simonseed
Copy link

Here

https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/smart_open_lib.py#L626

and here

https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/smart_open_lib.py#L630

Should be passing in the filename not the file object.

@tmylk
Copy link
Contributor

tmylk commented Mar 16, 2017

Reproduced. Working on a fix and test to read/write compressed files. In particular, it broke gensim Travis tests.

For completeness, could you paste a code snippet that breaks for you in this version.

CC @robottwo

@tmylk tmylk added the bug label Mar 16, 2017
@senatet
Copy link

senatet commented Mar 16, 2017

Hi.

I am also seeing this bug when attempting to use gensim, which uses smart_open to open gz compressed files... here is a minimal reproduction of the issue:


➜  /tmp virtualenv venv
New python executable in venv/bin/python
Installing setuptools, pip...done.
➜  /tmp source venv/bin/activate
(venv)➜  /tmp pip install -U pip smart_open
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting pip from https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
  Using cached pip-9.0.1-py2.py3-none-any.whl
Collecting smart-open
  Using cached smart_open-1.5.0.tar.gz
Collecting boto>=2.32 (from smart-open)
  Using cached boto-2.46.1-py2.py3-none-any.whl
Collecting bz2file (from smart-open)
  Using cached bz2file-0.98.tar.gz
Collecting requests (from smart-open)
  Using cached requests-2.13.0-py2.py3-none-any.whl
Installing collected packages: requests, bz2file, boto, smart-open, pip

  Running setup.py install for bz2file

  Running setup.py install for smart-open
  Found existing installation: pip 6.0.8
    Uninstalling pip-6.0.8:
      Successfully uninstalled pip-6.0.8

Successfully installed boto-2.46.1 bz2file-0.98 pip-9.0.1 requests-2.13.0 smart-open-1.5.0
(venv)➜  /tmp 
(venv)➜  /tmp echo 'test text' | gzip > test_text.gz
(venv)➜  /tmp zcat test_text.gz
test text
(venv)➜  /tmp python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import smart_open
>>> fname = './test_text.gz'
>>> fd = smart_open.smart_open(fname)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 138, in smart_open
    return file_smart_open(parsed_uri.uri_path, mode)
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 642, in file_smart_open
    return compression_wrapper(open(fname, mode), fname, mode)
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 630, in compression_wrapper
    return make_closing(GzipFile)(file_obj, mode)
  File "/usr/lib/python2.7/gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, file found

@simonseed
Copy link
Author

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)

@tmylk tmylk closed this as completed in 1683156 Mar 17, 2017
@tmylk
Copy link
Contributor

tmylk commented Mar 17, 2017

Thanks for reporting. Fixed in #110 and released in 1.5.1 on pypi

tmylk pushed a commit that referenced this issue Mar 27, 2017
…and #110. (#112)

* Better support for custom S3 servers.

This patch adds support for custom S3 servers in the connection string.
It also adds explicit support for setting the server port, and whether
or not to use SSL, both as paramaters to the smart_open function as
well as within the connection string.

These changes are neccessary to be able to connect to s3proxy and
other custom s3 servers which don't run on the default port,
or neccessarily use SSL.

* Fix unit tests

* updated README.rst with new s3 mode.

* Added a new unit test for the unsecured calling form

* Updated style and unit test.

* Check that the port argument isnt normally passed.

* Add generic HTTP and HTTPS streaming support.

Adds support for opening vanilla HTTP and HTTPS addresses.
Supports efficient streaming, gzip and bz2 compression,
as well as Kerberos and username/password (basic) http
authentication.

* removed previous merge artifact;

* Raise exception instead of returning it :/

* Raise http exceptions properly

* neccessary import

* python 3 compatibility

* Reverted make_closing -> closing

We still want to maintain Python 2.6 compatibility,
so don't rely on contextlib.closing.

* Refactor the code to get the Python version

* Refactored the GZfile and BZ2File compression wrappers.

* Refactored HttpOpenRead unit tests.

Now they don't require internet access, and will test for
Basic authentication in the HTTP header.

* Clean up http unit tests.

http => https, and remove old versions of the tests.

* Cosmetic changes and doc updates.

* Re-use the open filehandle rather than open a new one.

This allows one to use any filehandle-like object instead of
just local posix. It also avoids unneccessary filesystem syscalls.

* merge artifact

* Add unit tests for compressed httpd reads.

This breaks out the http tests into their own test class.

Also fixed a few behaviors in the HttpReader uncovered by
the new tests (yay).

* fixed import for python3

* removed stray import

* Handle some python3 byte vs unicode incompatibilityes.

Works now on Python 2 as well as Python 3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants