Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Should we allow "bytearray" #27

Closed
GoogleCodeExporter opened this issue Apr 10, 2015 · 9 comments
Closed

Should we allow "bytearray" #27

GoogleCodeExporter opened this issue Apr 10, 2015 · 9 comments

Comments

@GoogleCodeExporter
Copy link

We have to decide what to do with bytearray in
Transport.write() and sendto() methods.

Should we allow bytearray?

Original issue reported on code.google.com by fafhr...@gmail.com on 27 Mar 2013 at 11:41

@GoogleCodeExporter
Copy link
Author

I think we should support bytearray in all places where we support bytes, but 
in all cases where we hold on to it after returning from the function we should 
make a copy -- either extend another bytearray or append a bytes() copy to a 
list.

Original comment by gvanrossum@gmail.com on 27 Mar 2013 at 11:43

@GoogleCodeExporter
Copy link
Author

Original comment by gvanrossum@gmail.com on 16 Oct 2013 at 4:42

  • Added labels: Priority-Low
  • Removed labels: Priority-Medium

@GoogleCodeExporter
Copy link
Author

Issue 87 has been merged into this issue.

Original comment by gvanrossum@gmail.com on 21 Nov 2013 at 12:54

@GoogleCodeExporter
Copy link
Author

To explain the issue to the creator of issue 87: if we allowed bytearray or 
e.g. memoryview or arrays, we'd have to copy them before storing them in the 
buffer, because otherwise the bytearray's contents might change while we're 
waiting for the I/O to happen. Given that we have to copy anyway, the caller 
might as well do that -- and this restriction on the API then becomes a gentle 
hint that using bytearray may actually be *less* efficient than using bytes.

So I am actually thinking of closing this as won't fix, but I'll entertain some 
discussion first.

(Note that you may be able to avoid the copy if the send() happens inline, 
because in that case the data is never added to the buffer; but (a) we only do 
that for TCP and UDP using selectors; we always append to the buffer when using 
SSL/TLS, and the Windows Proactor also always appends to the buffer. So, again 
the seeming optimization might backfire depending on your transport type.)

Original comment by gvanrossum@gmail.com on 21 Nov 2013 at 1:01

@GoogleCodeExporter
Copy link
Author

In my case, I use bytearray for performance reasons, so I won't like to have an 
extra copy every time I sent data. Would it make sense to accept memoryview 
objects as well in all write methods? This would also avoid the slice copy when 
the socket only writes part of the bytearray. I would leave to the user the 
responsibility to not alter the data until the bytearray has been completely 
sent. A new method in Transport to wait for all writers in Transport._sock to 
be cleared would help. If the idea makes sense I can submit a patch for it.

Original comment by llpam...@gmail.com on 21 Nov 2013 at 1:35

@GoogleCodeExporter
Copy link
Author

Can you tell me how you create the bytearray? Do you know how much the copy 
would cost you? (It's easy enough to benchmark asyncio with the assert taken 
out if you want to. :-)

Leaving the user the responsibility to not modify the data is unsatisfactory -- 
plus, the user could modify the *size* of the data which could break 
invariants/assumptions of the transport.

Waiting until the buffer is completely flushed can be done using a combination 
of calling transport.set_write_buffer_limits(0) and defining a 
protocol.resume_writing() callback. (Check the definition of drain() in 
streams.py for a usage example.)

Original comment by gvanrossum@gmail.com on 21 Nov 2013 at 4:48

@GoogleCodeExporter
Copy link
Author

I'll do the benchmark. 

In the meanwhile, I also detected that you cannot write memoryview objects to a 
transport due the "b''.join()" line concatenating the bytes in the buffer. I 
solved that patching the _write_ready method as follows:

    def _write_ready(self):
        while self._buffer:
            data = self._buffer.popleft()
            try:
                n = self._sock.send(data)
            except (BlockingIOError, InterruptedError):
                self._buffer.appendleft(data)  # Still need to write this.
                break
            except Exception as exc:
                self._loop.remove_writer(self._sock_fd)
                self._fatal_error(exc)
            else:
                data = data[n:]
                if data:
                    self._buffer.appendleft(data)  # Still need to write this.
                    break

        self._maybe_resume_protocol()
        if not self._buffer:
            self._loop.remove_writer(self._sock_fd)
            if self._closing:
                self._call_connection_lost(None)
            elif self._eof:
                self._sock.shutdown(socket.SHUT_WR)

It avoids some extra copies when the buffer grows.
Does it break any internal compatibility?

Original comment by llpam...@gmail.com on 21 Nov 2013 at 6:57

@GoogleCodeExporter
Copy link
Author

There's movement here. We may end up changing the buffer to be a bytearray 
rather than a deque. See recent discussion on python-tulip list and this code 
review: https://codereview.appspot.com/33490043/

Original comment by gvanrossum@gmail.com on 27 Nov 2013 at 5:36

@GoogleCodeExporter
Copy link
Author

This issue was closed by revision c50e09feb51f.

Original comment by gvanrossum@gmail.com on 27 Nov 2013 at 10:13

  • Changed state: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant