Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large #80

t2y · 2015-01-25T19:25:16Z

Could you confirm whether accessing www.google.co.uk would occur as below?

My environment is OS X Yosemite 10.10.1 (x86_64) and use MacPorts to install Python 3.4.2.

(hyper)$ python
Python 3.4.2 (default, Nov 23 2014, 23:10:23) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from hyper import HTTP20Connection
>>> conn = HTTP20Connection('www.google.co.uk')
>>> conn.request('GET', '/')
1
>>> response = conn.getresponse()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../hyper/http20/connection.py", line 171, in getresponse
    return HTTP20Response(stream.getheaders(), stream)
  File ".../hyper/http20/stream.py", line 293, in getheaders
    self._recv_cb()
  File ".../hyper/http20/connection.py", line 541, in _recv_cb
    self._consume_single_frame()
  File ".../hyper/http20/connection.py", line 483, in _consume_single_frame
    frame, length = Frame.parse_frame_header(header)
  File ".../hyper/http20/frame.py", line 52, in parse_frame_header
    frame = FRAMES[type](stream_id)
KeyError: 97

The buffering issue in BufferedSocket causes this error. In my environment self._sck.recv_into(_buffer) read data only 1135 bytes in spite of _buffer size is 65535 bytes when target length (amt) is 16384 bytes. I'm not sure whether socket.recv() would ensure to read the buffer given length.

$ man 2 recv
     ssize_t
     recv(int socket, void *buffer, size_t length, int flags);

So, I changed to read data from socket until count == 0 (it means no data) or self._bytes_in_buffer == amt.

Lukasa · 2015-01-25T19:54:53Z

So this is a tricky issue.

recv_into is guaranteed to return as much data as it can into the size of the buffer. What it does not do is block the read until it can fill the buffer. This means that, when you call recv_into the socket layer will copy as much data out of its buffer as it can into the buffer provided.

In the case in question, we have room in the buffer and amt is large, but smaller than the room in the buffer. In principle, therefore, recv_into could copy amt bytes. However, in this specific case the OS layer hasn't received all those bytes yet: it's only received 1135 of them. The rest are still in transit from Google to you.

Because of the way recv_into is written, rather than blocking until all those bytes are received it returns immediately with what it has.

You're correct that the BufferedSocket implementation currently doesn't account for this. It should, but doing so requires that we think very carefully. That's because the BufferedSocket has a very notable extra property, which is that it allows for 'optimistic readahead'.

That's implemented in hyper.http20.connection.HTTP20Connection._recv_cb. This confirms that a socket has data available to read, and if it does will pop frames off the connection. It does this to make sure that we see control frames early, but it must not block. If it blocks here, we cause users to have a severe and negative experience, as their request blocks behind pumping other incomplete frames off the connection.

This change as-is would remove that function, because any frame being read must be read it its entirety. What we need, I think, is a non_blocking flag on BufferedSocket.recv() that will circumvent this loop: essentially, if the socket doesn't have data to read and there isn't enough data in the buffer, return nothing. Does that sound like an extension you could make?

t2y · 2015-01-25T20:24:39Z

Thank you for complaisant explanation!

I have learned the socket behavior about what was happening. I understand that improving the loop function in hyper.http20.connection.HTTP20Connection._recv_cb is better. Then, I agree with BufferedSocket must not block.

I'll investigate more and re-implement this solution.

Lukasa · 2015-01-25T23:14:18Z

Thanks @t2y! Your work here is extremely valued. =) 🍰

t2y · 2015-01-27T07:50:55Z

I re-implemented without changing BufferedSocket behavior. However, there's something wrong in testing?

t2y · 2015-01-27T07:58:45Z

ah, maybe it would need coverage 100. I'll try.

$ coverage report -m --fail-under 100

t2y · 2015-01-27T08:12:36Z

Before I write some test code, is my changes what you expect?

Lukasa · 2015-01-27T10:23:07Z

@t2y It's closer! I think a few small changes and we should get there.

Three notes. First, the count variable in _recv_cb is important to ensure that a single stream doesn't get locked into a tight loop of consuming frames. We should keep it. =)

The second problem is that this change allows a DoS attack if the server emits part of a frame and blocks on the rest. That's probably OK in an early version, but we should note it. You don't need to make a change here, just add a comment to mention the DoS risk.

The final problem is that we do a lot of copying of data in this line:

data = memoryview(data.tobytes() + _data.tobytes())

I think at this point we may want to refactor the code a bit to avoid the copies. I suggest using an array.array() that is length long, and then having a single memoryview into which we read the data. That should minimise the copies. It's obviously not perfect, but it's good enough for now.

Let me know if it's not clear what that last point means and I'll write a bit of sample code to show what I mean. =)

Again, @t2y, thanks for this! You're doing fantastic work!

t2y · 2015-01-27T10:40:37Z

First, the count variable in _recv_cb is important to ensure that a single stream doesn't get locked into a tight loop of consuming frames. We should keep it. =)

oops! I'll revert the changes.

The second problem

I'd like you to write the comment after I finished to fix. Because, I'm not native English writer, so please 😓

The final problem is that we do a lot of copying of data in this line:

Yes. I think so, too. I'll improve it.

Lukasa · 2015-01-27T10:47:15Z

@t2y I'm happy to write the comment for you. =)

t2y · 2015-01-27T11:56:30Z

I suggest using an array.array() that is length long, and then having a single memoryview into which we read the data.

According to https://docs.python.org/3.4/library/array.html, it seems that array module cannot handle bytes data. So, I make bytearray with memoryview and handle it. I guess it's better than previous one.

Lukasa · 2015-01-27T13:16:59Z

Alright, now we're really getting close! This would be merge-able as-is, but I'd like us to try to simplify the code a bit now, so that I can remember what it's doing when I come back in a few months.

I think that, rather than having an if followed by a while, we should just use a while. This means that you need to initialize the buffer and buffer_view ahead of loop. Then, the case where length == 0 becomes a trivial sub-case of the main loop. =)

The second thing is that we should avoid the use of .to_bytes() when copying from one memoryview to another. Instead, you should use the [:] construct to perform the copy. For example, this:

_buffer_view[_end:_end + data_length] = _data.tobytes()

becomes this:

_buffer_view[_end:_end + data_length] = _data[:]

This avoids creating an intermediate bytes object that needs to immediately be garbage collected. In principle, this function shouldn't ever need to call .to_bytes(), because we're always working with memoryviews.

Again, please let me know if any of that was unclear. =)

t2y · 2015-01-27T14:31:17Z

I think that, rather than having an if followed by a while, we should just use a while.

You mean like this?

    def _recv_payload(self, length):                                                
        if not length:                                                              
            return memoryview(b'')                                                  

        buffer = bytearray(length)                                                  
        buffer_view = memoryview(buffer)                                            
        index = 0
        # _sock.recv(length) might not read out all data if the given length is
        # very large. So it should be to retrieve from socket repeatedly.
        while True:                                                                 
            data = self._sock.recv(length)                                          
            data_length = len(data)                                                 
            end = index + data_length                                               
            buffer_view[index:end] = data[:]                                        
            length -= data_length                                                   
            if length == 0 or data_length == 0:
                break                                                               
            index = end                                                             

        return buffer_view[:end]

copying from one memoryview to another. Instead, you should use the [:]

Oh, sounds good. I learned an efficient idiom.

Lukasa · 2015-01-28T08:13:18Z

Oh, sounds good. I learned an efficient idiom.

Glad to hear it! memoryviews are a fun part of Python that most people don't seem to know about, so this will be a useful skill for you.

As for your loop, I think we can go one better. Rather than using while True with an if...break at the bottom, you can hoist the if test to be the while test (inverting the logic). Thus, your

while True:
    ...
    if length == 0 or data_length == 0:
        break

becomes

while length and data_length:
    ...

…te large like SETTINGS_MAX_FRAME_SIZE

t2y · 2015-01-28T08:44:00Z

memoryviews are a fun part of Python that most people don't seem to know about, so this will be a useful skill for you.

Exactly, that excites me.

you can hoist the if test to be the while test (inverting the logic). Thus, your

Absolutely, I didn't notice at that time. 😓 You give me awareness and it is helpful. Try again.

    def _recv_payload(self, length):                                                
        if not length:                                                              
            return memoryview(b'')                                                  

        buffer = bytearray(length)                                                  
        buffer_view = memoryview(buffer)                                            
        index = 0                                                                   
        data_length = -1                                                            
        # _sock.recv(length) might not read out all data if the given length        
        # is very large. So it should be to retrieve from socket repeatedly.        
        while length and data_length:                                               
            data = self._sock.recv(length)                                          
            data_length = len(data)                                                 
            end = index + data_length                                               
            buffer_view[index:end] = data[:]                                        
            length -= data_length                                                   
            index = end                                                             

        return buffer_view[:end]

Lukasa · 2015-01-28T10:33:43Z

@t2y Hurrah! This looks great! 🍰 Thank you so much.

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large

Lukasa · 2015-01-28T10:40:04Z

FYI, you've earned your contributors spot here. =D 🍰 ⭐

t2y · 2015-01-28T11:06:02Z

I'm learning http2 with pleasure. Thank you, too! 😄

t2y force-pushed the development branch from e8a92de to b5c4e4c Compare January 27, 2015 07:35

t2y force-pushed the development branch 2 times, most recently from 7a79c61 to 8d83109 Compare January 27, 2015 11:44

t2y force-pushed the development branch from 8d83109 to 9bc162e Compare January 27, 2015 15:02

Fixed to read out all data from socket when Data Frame payload is qui…

40d65b5

…te large like SETTINGS_MAX_FRAME_SIZE

t2y force-pushed the development branch from 9bc162e to 40d65b5 Compare January 28, 2015 08:34

Lukasa added a commit that referenced this pull request Jan 28, 2015

Merge pull request #80 from t2y/development

82a2f33

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large

Lukasa merged commit 82a2f33 into python-hyper:development Jan 28, 2015

Lukasa added a commit that referenced this pull request Jan 28, 2015

Docstrings and comment updates for #80.

08a74e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large #80

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large #80

t2y commented Jan 25, 2015

Lukasa commented Jan 25, 2015

t2y commented Jan 25, 2015

Lukasa commented Jan 25, 2015

t2y commented Jan 27, 2015

t2y commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 28, 2015

t2y commented Jan 28, 2015

Lukasa commented Jan 28, 2015

Lukasa commented Jan 28, 2015

t2y commented Jan 28, 2015

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large #80

Fixed to read out all data for buffering in BufferedSocket when Data Frame payload is quite large #80

Conversation

t2y commented Jan 25, 2015

Lukasa commented Jan 25, 2015

t2y commented Jan 25, 2015

Lukasa commented Jan 25, 2015

t2y commented Jan 27, 2015

t2y commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 27, 2015

t2y commented Jan 27, 2015

Lukasa commented Jan 28, 2015

t2y commented Jan 28, 2015

Lukasa commented Jan 28, 2015

Lukasa commented Jan 28, 2015

t2y commented Jan 28, 2015