Avoiding copies #35

bluetech · 2017-03-17T21:46:04Z

These are some uneducated musings (I am not really knowledgeable about network programming).

Let's say we are writing a simple-minded server over a TCP socket. For simplicity I am only talking about the receive path, not the send path. The basic flow using h11 is:

recv from the socket into a buffer.
Pass the data to h11, which copies it into its own buffer.

A side note about the first step (not the topic of this issue): this is usually done using data = socket.recv(BUF_SIZE). From my reading of the cpython code, the way python does this magic is that it just allocates a fresh buffer of size BUF_SIZE and reads into that. If BUF_SIZE - 1 < SMALL_REQUEST_THRESHOLD (= 512), this might come from some memory pool, otherwise it's just malloc. So for a C programmer, this seems very wasteful. But fortunately it seems possible to reuse a buffer by doing something like this (I haven't checked to see if it actually makes a difference):

recvbuf = bytearray(BUF_SIZE)
recvview = memoryview(recvview)

<loop>:
    nread = socket.recv_into(recvbuf)
    data = recvview[:nread]

What I do want to talk about (and is actually relevant to h11...) is the fact that we make two copies of the data: kernel -> buffer, buffer -> h11. It seems natural to ask: can we reduce this to one copy?

I don't think this is a pressing issue for h11, as any copying overhead is pretty minor compared to other overhead, currently. But it seems interesting to ask in relation to the sans-io methodology in general.

One way I imagine this could work, without inverting the logic again and losing the advantage of sans-io, is to have a way for h11 itself provide a buffer for the application to recv into. Like maybe the application tells h11 how much it wants to recv, and gets back a memoryview of h11's buffer. Then it receives into that and tells h11 how much it read. But there are probably better ways.

The text was updated successfully, but these errors were encountered:

njsmith · 2017-03-18T02:52:07Z

Yeah, it's an interesting thing to think about. I'm not sure there's much point in worrying about these copies, for several reasons. First, almost anything in Python is slow compared to memcpy, e.g., even setting up a memoryview object also requires allocating memory (for the object itself). This can certainly still be a win for large buffers, especially in cases where it allows you to avoid quadratic slowdown, but the benefits are surprisingly situational. Plus, if you look at where the data goes after you pass it to h11 .. h11 doesn't keep around a pre-allocated buffer either, it uses a bytearray as a "moving buffer" where we constantly append new data on the end and delete it from the beginning. Since bytearray's underlying storage is contiguous, this means that we end up reallocating and copying the data as we go. bytearray is clever enough to amortize these costs so that we only end up copying any given piece of data a small constant number of times at worst (and h11.ReceiveBuffer has some logic to make sure of this even on versions of Python where bytearray is less clever), but it still means that we don't have an obvious empty buffer to pass to recv_into, and that even if we did we still wouldn't be "zero copy".

The other option in principle would be to use a fancier buffer structure, like a linked list of "chunks", or a ring buffer. But the problem with these kinds of constructs is that they require significant amounts of Python-level logic to construct, search through, etc. h11's strategy for speed is to use as little Python as possible – it leans heavily on C methods like bytes.find, bytes.split, and regexes for parsing. But all of these though require a single contiguous buffer, like a bytearray, so we'd have to reimplement them in Python, which almost certainly would cost more than we'd gain from avoiding a few memcpys.

All in all, it seems difficult to get a meaningful advantage from using recv_into in h11. If you (or anyone) want to experiment with it then I'll be interested to see the results though :-).

(Note thought that h11 does support zero-copy sends of data by using send_with_data_passthrough + socket.sendmsg.)

njsmith · 2018-10-30T05:21:21Z

For the reasons discussed above, I don't think there are any practical changes we can make here in the short-to-medium term, so closing.

bluetech mentioned this issue Mar 17, 2017

Remove slow bytesify calls when they are avoidable #34

Merged

njsmith closed this as completed Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding copies #35

Avoiding copies #35

bluetech commented Mar 17, 2017 •

edited

Loading

njsmith commented Mar 18, 2017

njsmith commented Oct 30, 2018

Avoiding copies #35

Avoiding copies #35

Comments

bluetech commented Mar 17, 2017 • edited Loading

njsmith commented Mar 18, 2017

njsmith commented Oct 30, 2018

bluetech commented Mar 17, 2017 •

edited

Loading