S3 file like objects - can they implement a flush method? #113

pmajmudar · 2017-03-17T12:19:10Z

It would be useful if S3 file-like objects also implemented the "flush" method (which is available on local file like objects)

This would make it easier to use as a drop-in replacement for file-like objects.

This could be a no-op or could genuinely force a flush of data.

What do you think?

tmylk · 2017-03-17T16:36:54Z

Is there an internal buffer in boto that needs to be flushed?

pmajmudar · 2017-03-17T16:54:28Z

I don't think so... that's not quite what I meant sorry.

I'm trying to handle the situation where I'm passing the file object into a third-party library that I don't have control over. That library is calling the "flush" method. So I ordinarily I do:

f = open('test.csv', 'w')
library.library_method(f) # everything works ok - library_method calls flush - no errors

vs

f = smart_open('s3://bucket/blah/test.csv', 'w')
library.library_method(f) # fails - flush doesn't exist

I.e. it would be nice if smart_open could be used as a direct replacement for open. As I said above, flush doesn't have to do anything - it can just be a NoOp. If it can be flushed - great.

piskvorky · 2017-03-18T00:13:48Z

I guess we can do that, sure. Any other methods of that sort? Can you add them all in a PR?

Plus add clear motivation & comments to those no-op methods too, so someone doesn't remove them in the future by accident.

ghost · 2017-03-18T12:23:47Z

@tmylk I have made a PR resolving this issue like you asked me to. Do review.

tmylk · 2017-03-20T20:49:06Z

This request is similar to previous #64 (comment) request to inherit or quack io.IOBase interface support. See python docs for the list of other functions that need to be added. They only have minor differences across Python versions.

pmajmudar · 2017-03-21T08:26:08Z

Yes - I was thinking a sensible implementation might inherit from io.IOBase

I'll find some time to work on this and open a PR.

mpenkov · 2017-12-10T06:52:44Z

@menshikh-iv This is done.

(smartopen)sergeyich:issue152 misha$ python
Python 3.6.3 (default, Oct  4 2017, 06:09:15)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import smart_open
>>> f = smart_open.smart_open('s3://private/key.txt')
>>> dir(f)
['__abstractmethods__', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', '_buffer', '_buffer_size', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_content_length', '_current_pos', '_eof', '_fill_buffer', '_line_terminator', '_object', '_raw_reader', '_read_from_buffer', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'terminate', 'truncate', 'writable', 'write', 'writelines']

menshikh-iv · 2017-12-13T06:39:59Z

Thanks @mpenkov
Works fine with current master fbc82cc

peter-t-kim · 2019-06-20T19:14:21Z

Hello Everyone,

I see that @mpenkov 's comment shows that the f object has the 'fileno' method implemented. However, after migration to the new open function, I no longer see 'fileno' as a method for this object anymore. This is rather unfortunate as I would like to use the return value of smart_open.open() as the stdin parameter to a Popen call. But as it is not implemented it throws this error:
[ERROR] UnsupportedOperation: fileno

Is there a reason the fileno operation is no longer supported? Or am I missing something?

mpenkov · 2019-06-21T02:28:50Z

@peter-t-kim I'm not sure at which stage we got rid of fileno. I don't think it's a product of the change to the open function (that change happened at a different abstraction level). You can confirm this by checking on the commit in my previous comment (fbc82cc) and then:

In [6]: fin = open('s3://bucket/key')  # redacted

In [7]: 'fileno' in dir(fin)
Out[7]: True

In [8]: fin.fileno()
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-8-63d0a20bf555> in <module>
----> 1 fin.fileno()

UnsupportedOperation: fileno

When reading from S3, smart_open does not create any new file descriptors. The file descriptors get created by the OS when opening local files: in the case of reading from S3, we bypass that abstraction level entirely. So, I don't think it's possible to even have a file descriptor in your situation.

If you want to pipe data from S3 into a subprocess, you have to do something like this.

ghost mentioned this issue Mar 18, 2017

Added flush() for S3 #114

Closed

menshikh-iv closed this as completed Dec 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 file like objects - can they implement a flush method? #113

S3 file like objects - can they implement a flush method? #113

pmajmudar commented Mar 17, 2017

tmylk commented Mar 17, 2017

pmajmudar commented Mar 17, 2017 •

edited

piskvorky commented Mar 18, 2017

ghost commented Mar 18, 2017

tmylk commented Mar 20, 2017

pmajmudar commented Mar 21, 2017

mpenkov commented Dec 10, 2017

menshikh-iv commented Dec 13, 2017

peter-t-kim commented Jun 20, 2019 •

edited

mpenkov commented Jun 21, 2019

S3 file like objects - can they implement a flush method? #113

S3 file like objects - can they implement a flush method? #113

Comments

pmajmudar commented Mar 17, 2017

tmylk commented Mar 17, 2017

pmajmudar commented Mar 17, 2017 • edited

piskvorky commented Mar 18, 2017

ghost commented Mar 18, 2017

tmylk commented Mar 20, 2017

pmajmudar commented Mar 21, 2017

mpenkov commented Dec 10, 2017

menshikh-iv commented Dec 13, 2017

peter-t-kim commented Jun 20, 2019 • edited

mpenkov commented Jun 21, 2019

pmajmudar commented Mar 17, 2017 •

edited

peter-t-kim commented Jun 20, 2019 •

edited