BufferedIOBase.readinto1 is missing #64777
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = <Date 2014-06-22.21:17:55.992> created_at = <Date 2014-02-10.00:13:49.017> labels = ['type-feature', 'library'] title = 'BufferedIOBase.readinto1 is missing' updated_at = <Date 2014-06-22.21:17:55.991> user = 'https://bugs.python.org/nikratio'
activity = <Date 2014-06-22.21:17:55.991> actor = 'python-dev' assignee = 'none' closed = True closed_date = <Date 2014-06-22.21:17:55.992> closer = 'python-dev' components = ['Library (Lib)'] creation = <Date 2014-02-10.00:13:49.017> creator = 'nikratio' dependencies =  files = ['34632', '34793', '34811', '34812', '34863', '34864', '35539', '35647', '35648'] hgrepos =  issue_num = 20578 keywords = ['patch'] message_count = 22.0 messages = ['210794', '210795', '214930', '215263', '215979', '215984', '215988', '215989', '216049', '216053', '216055', '216056', '216059', '216266', '220012', '220015', '220018', '220061', '220062', '220662', '220665', '221313'] nosy_count = 8.0 nosy_names = ['loewis', 'pitrou', 'vstinner', 'benjamin.peterson', 'stutzbach', 'nikratio', 'python-dev', 'hynek'] pr_nums =  priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue20578' versions = ['Python 3.5']
The text was updated successfully, but these errors were encountered:
I have attached a patch that adds readinto1() to BufferedReader and BufferedRWPair.
An example use case for this method is receiving a large stream over a protocol like HTTP. You want to use a buffered reader so you can efficiently parse the header, but after that you want to stream the data as it comes in, i.e. you want to use read1 or, for improved performance, readinto1.
Feedback is welcome.
Here's a little script to estimate the performance difference between using read1 and readinto1 to read large amounts of data. On my system, I get:
C readinto1: 4.960e-01 seconds
In other words, _pyio.BufferedReader.readinto1 is more than a factor of 2 faster than _pyio.BufferedReader.read1 and io.readinto1 is faster than io.read1 by about 20%.
On its own, I think this would justify keeping an implementation of readinto1 in _pyio.BufferedReader instead of falling back on the default (that is implemented using read1). However, I believe that people who need performance are probably not using _pyio but io, so *my* argument for keeping it implemented in _pyio is to keep the implementations similiar.
I found studying _pyio very helpful to understand the C code in io. If we implement BufferedReader.readinto1 in io, but not in _pyio.BufferedReader, this advantage would be reduced.
That said, I am primary interested in getting readinto1 into io. So I'm happy to either extend the patch to also provide a fast readinto implementation for _pyio (to align it with io), or to remove the readinto1 implementation in _pyio.
(Rietveld is giving me errors, so I'm replying here)
On 2014/04/13 02:22:23, loewis wrote: >>> Again, why a separate implementation here? >> >> For performance reasons. Relying on the default implementation >> would fall back to using read1(), which means a new bytes object >> is created first. > > Hmm. > a) if performance was relevant, it should apply to readinto() as well.
I didn't even notice the readinto implementation was missing. But I
I'm very sorry, but I still don't see which code in readinto1() is
I posted a small benchmark to the issue tracker. Personally, I think
(Yes, I did put performance first in my last reply, but only because I
Yes - but I don't quite understand why it matters (if you need read1/readinto1, you cannot just use read/readinto instead).
C readinto1: 4.638e-01 seconds
That shows that the Python readinto is definetely not up-to-par and could use improvement as well. Is that what you're getting at?
Maybe this is why we seem to be talking past each other :-). I did not look or work on readinto at all. All I noticed is that there is a read1, but no readinto1. So I implemented a readinto1 as well as I could.
I see. It's not actually true that there is no readinto - it's inherited from the base class.
I think it is more important that the implementation is consistent than that it is performant (but achieving both should be possible).
Whether or not _pyio needs to be performant, I don't know. Having it consistent with _io would be desirable, but might not be possible.
Attached is an updated patch that
Performance of the _pyio implementation on my system is:
Thanks for taking the time, and apologies about the test failure. I was probably too eager and ran only the test_io suite instead of everything.
I looked at the failure, and the problem is that the default Python BufferedIOBase.readinto implementation is semi-broken. It should work with any object implementing the memoryview protocol (like the C implementation), but it really only works with bytearray objects. The testIteration test only worked (prior to the patch) because there is a special case for array objects with format 'b':
In other words, trying to read into any other object has always failed. In particular, even format code 'B' fails:
>>> import _pyio >>> from array import array >>> buf = array('b', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) 10 >>> buf = array('B', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 1096, in readinto buf[:len_] = array.array('b', buf2) TypeError: bad argument type for built-in operation
The readline implementation that my patch adds for BufferedReader does not contain this special case, and therefore with the patch even the test with a 'b'-array fails.
For now, I've added the same special casing of 'b'-type arrays to the _readline() implementation in BufferedReader. This fixes the immediate problem (and this time I definitely ran the entire testsuite).
However, the fix is certainly not what I would consider a good solution.. but I guess that would better be addressed by a separate patch that also fixes the same issue in BufferedIOBase?
I used the wrong interpreter when cutting and pasting the example above, here's the correct version to avoid confusion with the traceback:
>>> import _pyio >>> from array import array >>> buf = array('b', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) 10 >>> buf = array('B', b'x' * 10) >>> _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 662, in readinto b[:n] = data TypeError: can only assign array (not "bytes") to array slice During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 667, in readinto b[:n] = array.array('b', data) TypeError: bad argument type for built-in operation