Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 file like objects - can they implement a flush method? #113

Closed
pmajmudar opened this issue Mar 17, 2017 · 10 comments
Closed

S3 file like objects - can they implement a flush method? #113

pmajmudar opened this issue Mar 17, 2017 · 10 comments

Comments

@pmajmudar
Copy link

It would be useful if S3 file-like objects also implemented the "flush" method (which is available on local file like objects)

This would make it easier to use as a drop-in replacement for file-like objects.

This could be a no-op or could genuinely force a flush of data.

What do you think?

@tmylk
Copy link
Contributor

tmylk commented Mar 17, 2017

Is there an internal buffer in boto that needs to be flushed?

@pmajmudar
Copy link
Author

pmajmudar commented Mar 17, 2017

I don't think so... that's not quite what I meant sorry.

I'm trying to handle the situation where I'm passing the file object into a third-party library that I don't have control over. That library is calling the "flush" method. So I ordinarily I do:

f = open('test.csv', 'w')
library.library_method(f) # everything works ok - library_method calls flush - no errors

vs

f = smart_open('s3://bucket/blah/test.csv', 'w')
library.library_method(f) # fails - flush doesn't exist

I.e. it would be nice if smart_open could be used as a direct replacement for open. As I said above, flush doesn't have to do anything - it can just be a NoOp. If it can be flushed - great.

@piskvorky
Copy link
Owner

I guess we can do that, sure. Any other methods of that sort? Can you add them all in a PR?

Plus add clear motivation & comments to those no-op methods too, so someone doesn't remove them in the future by accident.

@ghost ghost mentioned this issue Mar 18, 2017
@ghost
Copy link

ghost commented Mar 18, 2017

@tmylk I have made a PR resolving this issue like you asked me to. Do review.

@tmylk
Copy link
Contributor

tmylk commented Mar 20, 2017

This request is similar to previous #64 (comment) request to inherit or quack io.IOBase interface support. See python docs for the list of other functions that need to be added. They only have minor differences across Python versions.

@pmajmudar
Copy link
Author

Yes - I was thinking a sensible implementation might inherit from io.IOBase

I'll find some time to work on this and open a PR.

@mpenkov
Copy link
Collaborator

mpenkov commented Dec 10, 2017

@menshikh-iv This is done.

(smartopen)sergeyich:issue152 misha$ python
Python 3.6.3 (default, Oct  4 2017, 06:09:15)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import smart_open
>>> f = smart_open.smart_open('s3://private/key.txt')
>>> dir(f)
['__abstractmethods__', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', '_buffer', '_buffer_size', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_content_length', '_current_pos', '_eof', '_fill_buffer', '_line_terminator', '_object', '_raw_reader', '_read_from_buffer', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'terminate', 'truncate', 'writable', 'write', 'writelines']

@menshikh-iv
Copy link
Contributor

Thanks @mpenkov
Works fine with current master fbc82cc

@peter-t-kim
Copy link

peter-t-kim commented Jun 20, 2019

Hello Everyone,

I see that @mpenkov 's comment shows that the f object has the 'fileno' method implemented. However, after migration to the new open function, I no longer see 'fileno' as a method for this object anymore. This is rather unfortunate as I would like to use the return value of smart_open.open() as the stdin parameter to a Popen call. But as it is not implemented it throws this error:
[ERROR] UnsupportedOperation: fileno

Is there a reason the fileno operation is no longer supported? Or am I missing something?

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 21, 2019

@peter-t-kim I'm not sure at which stage we got rid of fileno. I don't think it's a product of the change to the open function (that change happened at a different abstraction level). You can confirm this by checking on the commit in my previous comment (fbc82cc) and then:

In [6]: fin = open('s3://bucket/key')  # redacted

In [7]: 'fileno' in dir(fin)
Out[7]: True

In [8]: fin.fileno()
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-8-63d0a20bf555> in <module>
----> 1 fin.fileno()

UnsupportedOperation: fileno

When reading from S3, smart_open does not create any new file descriptors. The file descriptors get created by the OS when opening local files: in the case of reading from S3, we bypass that abstraction level entirely. So, I don't think it's possible to even have a file descriptor in your situation.

If you want to pipe data from S3 into a subprocess, you have to do something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants