Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect connection error on implicit stream close from window update #312

Closed
rbtying opened this issue Sep 18, 2018 · 16 comments · Fixed by #313
Closed

Incorrect connection error on implicit stream close from window update #312

rbtying opened this issue Sep 18, 2018 · 16 comments · Fixed by #313

Comments

@rbtying
Copy link
Contributor

rbtying commented Sep 18, 2018

The server is allowed to send window updates for streams which are (locally) already closed, which is handled here:

https://github.com/carllerche/h2/blob/master/src/proto/streams/streams.rs#L398

However, in the case where the received ID is higher than self.next_stream_id, this causes the connection to error with Reason::PROTOCOL_ERROR, rather than implicitly closing any idle streams.

According to the HTTP/2 spec,

WINDOW_UPDATE can be sent by a peer that has sent a frame bearing the END_STREAM flag. This means that a receiver could receive a WINDOW_UPDATE frame on a "half-closed (remote)" or "closed" stream. A receiver MUST NOT treat this as an error (see Section 5.1).

In particular, treating it as a connection error causes all simultaneous requests to fail the next time we poll the connection, even if they would have succeeded.

@seanmonstar
Copy link
Member

Hm, the part that you cite doesn't seem to mention the issue you've described, right? You're receiving a WINDOW_UPDATE frame with a stream ID higher than anything the connection has seen so far, correct?

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

In the case we've encountered, we've seen the stream ID before, since we've just closed the stream (we just sent a few megabytes of data upstream). However, regardless, we should never set a connection error based off of a window update frame, right?

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

Note: https://github.com/carllerche/h2/blob/master/src/proto/streams/streams.rs#L382 is where we branch on whether or not the stream has already been close

@seanmonstar
Copy link
Member

I believe this is the relevant part that explains why it's currently a connection error:

idle:
All streams start in the "idle" state.
[...]
Receiving any frame other than HEADERS or PRIORITY on a stream in this state MUST be treated as a connection error (Section 5.4.1) of type PROTOCOL_ERROR.

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

In this case, the stream is not in the idle state -- it's been locally closed. It might suffice in this case to not call ensure_not_idle if we've ever heard of an equal or higher stream ID?

My interpretation of the flow of events is as follows:

  1. stream 61 is opened (and idle)
  2. we send data through stream 61
  3. server sends us window update on stream 61
  4. we close stream 61 because we're done sending data through it
  5. we receive window update on stream 61. Since it's closed, we fall through to the ensure_not_idle function.

The question, then, is why that's triggering any error at all (since next_stream_id should be 63)... and even if it were to trigger an error, if we don't know that stream 61 is idle, we should not return a protocol error.

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

I have a local repro that (roughly) amounts to muxing a bunch of nontrivial uploads on the same connection, where the underlying internet connection is slow. Example trace:

2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop process - 1 events, 140.221µs
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop poll - 5.828µs
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop time - Instant { t: 81168988057429 }
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop process - 0 events, 14.818µs
2018-09-18 15:41:50 TRACE [h2::proto::streams::recv] release_capacity; size=113
2018-09-18 15:41:50 TRACE [h2::proto::streams::recv] release_connection_capacity; size=113
2018-09-18 15:41:50 TRACE [tower_h2::recv_body] Data::drop: releasing capacity: 0
2018-09-18 15:41:50 TRACE [h2::proto::streams::recv] release_capacity; size=0
2018-09-18 15:41:50 TRACE [h2::proto::streams::recv] release_connection_capacity; size=0
2018-09-18 15:41:50 TRACE [h2::proto::streams::streams] drop_stream_ref; stream=Stream { id: StreamId(121), state: State { inner: Closed(EndStream) }, is_counted: false, ref_count: 2, next_pending_send: None, is_pending_send: false, send_flow: FlowControl { window_size: Window(65448), available: Window(0) }, requested_send_capacity: 0, buffered_send_data: 0, send_task: None, pending_send: Deque { indices: None }, next_pending_send_capacity: None, is_pending_send_capacity: false, send_capacity_inc: true, next_open: None, is_pending_open: false, next_pending_accept: None, is_pending_accept: false, recv_flow: FlowControl { window_size: Window(16777103), available: Window(16777216) }, in_flight_recv_data: 0, next_window_update: None, is_pending_window_update: false, reset_at: None, next_reset_expire: None, pending_recv: Deque { indices: Some(Indices { head: 49, tail: 49 }) }, recv_task: None, pending_push_promises: Queue { indices: None, _p: PhantomData }, content_length: Omitted }
2018-09-18 15:41:50 TRACE [h2::proto::streams::counts] transition_after; stream=StreamId(121); state=State { inner: Closed(EndStream) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=0; num_send=59
2018-09-18 15:41:50 TRACE [h2::proto::streams::streams] drop_stream_ref; stream=Stream { id: StreamId(121), state: State { inner: Closed(EndStream) }, is_counted: false, ref_count: 1, next_pending_send: None, is_pending_send: false, send_flow: FlowControl { window_size: Window(65448), available: Window(0) }, requested_send_capacity: 0, buffered_send_data: 0, send_task: None, pending_send: Deque { indices: None }, next_pending_send_capacity: None, is_pending_send_capacity: false, send_capacity_inc: true, next_open: None, is_pending_open: false, next_pending_accept: None, is_pending_accept: false, recv_flow: FlowControl { window_size: Window(16777103), available: Window(16777216) }, in_flight_recv_data: 0, next_window_update: None, is_pending_window_update: false, reset_at: None, next_reset_expire: None, pending_recv: Deque { indices: Some(Indices { head: 49, tail: 49 }) }, recv_task: None, pending_push_promises: Queue { indices: None, _p: PhantomData }, content_length: Omitted }
2018-09-18 15:41:50 TRACE [h2::proto::streams::counts] transition_after; stream=StreamId(121); state=State { inner: Closed(EndStream) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=0; num_send=59
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop poll - 550.437µs
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop time - Instant { t: 81168988629428 }
2018-09-18 15:41:50 TRACE [tokio_core::reactor] event Readable | Writable Token(2)
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] notifying a task handle
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] notifying a task handle
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop process - 1 events, 66.408µs
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop poll - 14.818µs
2018-09-18 15:41:50 DEBUG [tokio_core::reactor] loop time - Instant { t: 81168988719111 }
2018-09-18 15:41:50 TRACE [tokio_core::reactor] event Writable Token(2)
2018-09-18 15:41:50 TRACE [tokio_core::reactor] event Readable Token(7)
2018-09-18 15:41:50 TRACE [h2::proto::settings] send_pending_ack; pending=None
2018-09-18 15:41:50 TRACE [h2::codec::framed_read] poll
2018-09-18 15:41:50 TRACE [tokio_io::framed_read] attempting to decode a frame
2018-09-18 15:41:50 TRACE [tokio_io::framed_read] frame decoded from buffer
2018-09-18 15:41:50 TRACE [h2::codec::framed_read] poll; bytes=13B
2018-09-18 15:41:50 TRACE [h2::codec::framed_read] decoding frame from 13B
2018-09-18 15:41:50 TRACE [h2::codec::framed_read]     -> kind=WindowUpdate
2018-09-18 15:41:50 DEBUG [h2::codec::framed_read] received; frame=Frame::WindowUpdate(WindowUpdate { stream_id: StreamId(121), size_increment: 2147418111 })
2018-09-18 15:41:50 TRACE [h2::proto::connection] recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(121), size_increment: 2147418111 }
2018-09-18 15:41:50 TRACE [h2::proto::streams::recv] stream ID implicitly closed
2018-09-18 15:41:50 DEBUG [h2::proto::connection] Connection::poll; err=PROTOCOL_ERROR
2018-09-18 15:41:50 TRACE [h2::proto::streams::state] recv_err; err=Proto(PROTOCOL_ERROR)
2018-09-18 15:41:50 TRACE [h2::proto::streams::prioritize] clear_queue; stream-id=StreamId(239)
2018-09-18 15:41:50 TRACE [h2::proto::streams::prioritize] assign_connection_capacity; inc=0```

@seanmonstar
Copy link
Member

In this case, the stream is not in the idle state -- it's been locally closed.

That does seem incorrect then. What you propose makes sense, would you be up for submitting a PR (even better if it has a unit test)?

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

Yeah, I can put together a PR. Might need to do some more reading to determine how exactly we represent closed/idle states in the connection abstraction.

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

Real interesting: next_stream_id in this case appears to be 2!!?

@seanmonstar
Copy link
Member

I don't see what you mean about it being 2, but if the local side is a server, then that seems likely, since servers always create even numbered streams (when they create brand NEW streams, which is only done via push promises).

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

In this case this is a client, which is why it's confusing -- there should be no push promises involved at all in this log. In particular, I would have expected next_stream_id to be some odd number >= 121

@seanmonstar
Copy link
Member

I've looked through the logs you pasted, but I must be missing it. I don't see next_stream_id anywhere.

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

Yeah, it's not printed out in master -- I modified it to dump next_stream_id and got 2 as its current value.

Unfortunately the full log dump is in the tens of MB, so it's a bit of a pain to upload -- do you have a location where you'd prefer me to post it?

Relevant subsection

2018-09-18 16:15:06 TRACE [h2::proto::settings] send_pending_ack; pending=None
2018-09-18 16:15:06 TRACE [h2::codec::framed_read] poll
2018-09-18 16:15:06 TRACE [tokio_io::framed_read] attempting to decode a frame
2018-09-18 16:15:06 TRACE [tokio_io::framed_read] frame decoded from buffer
2018-09-18 16:15:06 TRACE [h2::codec::framed_read] poll; bytes=13B
2018-09-18 16:15:06 TRACE [h2::codec::framed_read] decoding frame from 13B
2018-09-18 16:15:06 TRACE [h2::codec::framed_read]     -> kind=WindowUpdate
2018-09-18 16:15:06 DEBUG [h2::codec::framed_read] received; frame=Frame::WindowUpdate(WindowUpdate { stream_id: StreamId(121), size_increment: 2147418111 })
2018-09-18 16:15:06 TRACE [h2::proto::connection] recv WINDOW_UPDATE; frame=WindowUpdate { stream_id: StreamId(121), size_increment: 2147418111 }
2018-09-18 16:15:06 TRACE [h2::proto::streams::recv] stream ID implicitly closed StreamId(121) >= StreamId(2)
2018-09-18 16:15:06 DEBUG [h2::proto::connection] Connection::poll; err=PROTOCOL_ERROR
2018-09-18 16:15:06 TRACE [h2::proto::streams::state] recv_err; err=Proto(PROTOCOL_ERROR)
2018-09-18 16:15:06 TRACE [h2::proto::streams::prioritize] clear_queue; stream-id=StreamId(239)


where the patch is

--- a/rust_vendor/h2-0.1.12/src/proto/streams/recv.rs
+++ b/rust_vendor/h2-0.1.12/src/proto/streams/recv.rs
@@ -571,7 +571,7 @@ impl Recv {
     pub fn ensure_not_idle(&self, id: StreamId) -> Result<(), Reason> {
         if let Ok(next) = self.next_stream_id {
             if id >= next {
-                trace!("stream ID implicitly closed");
+                trace!("stream ID implicitly closed {:?} >= {:?}", id, next);
                 return Err(Reason::PROTOCOL_ERROR);
             }
         }

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

Okay, if I'm understanding things correctly, this is what happens:

The client makes n requests to the server. These requests get stream ids (1, 3, ... 2n+1). However, we only update next_stream_id on stream reset and on receiving headers when the store returns Vacant for find_entry, so next_stream_id is still set to the default value of 2.

We don't encounter this in the common case because (usually) we would expect to finish sending the request in full prior to receving any headers, so we'd see Vacant in the store. However, in this case, I think the server is sending the headers prior to receiving the entirety of the request.

Now, the server is allowed to send us window updates for any of stream ids (1, 3, 2n+1), including after we have closed those streams locally (i.e. after those requests have completed and the closed streams have been dropped from the store). Now, next_stream_id is still 2, so we presume that we have seen no streams with a higher stream id (incorrectly) and error the connection.

@seanmonstar
Copy link
Member

I believe the bug is actually that at this line here, https://github.com/carllerche/h2/blob/master/src/proto/streams/streams.rs#L397, the wrong "side" is checked. If you look in the rest of the function, receiving a WINDOW_UPDATE frame means adjusting our send knowledge, but if the stream isn't in the store, then it asks the recv knowledge if the ID is ok, which is wrong. The other side doesn't send us window updates about streams they open (our recv).

@rbtying
Copy link
Contributor Author

rbtying commented Sep 18, 2018

hm, that makes sense!

Unfortunately my local repro is now failing at a different (earlier) stage, but I'll make that change and see if it makes the problem go away.

Thoughts on writing an appropriate unit test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants