New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server sending websocket frame too quickly after upgrade causes client disconnect #2173

Closed
jjongsma opened this Issue Jan 31, 2014 · 16 comments

Comments

Projects
None yet
5 participants
@jjongsma

jjongsma commented Jan 31, 2014

I have a server that broadcasts a "capabilities" message to clients immediately upon successful connect. 9 out of 10 times, this causes the client to disconnect. What is happening is this:

  1. Client sends upgrade request
  2. Server sends upgrade response to client, fires local HANDSHAKE_COMPLETED event
  3. Upstream server handler hears HANDSHAKE_COMPLETED event, and sends a websocket frame to the client
  4. Client receives upgrade response (HTTP 101)
  5. Client receives first websocket frame before #4 reaches websocket handshaker
  6. HttpClientCodec throws exception here while processing websocket frame, causing websocket handshaker to catch the exception and close the connection
  7. Client handshaker processes upgrade response and removes HttpClientCodec (oops, too late)

The only solutions currently are to have the server delay some arbitrary amount of time (bad, and not guaranteed to fix) or forcing the client to initiate all communication (better, but not really feasible given our deployments).

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Jan 31, 2014

Member

@jjongsma which version of netty you re using ?

Member

normanmaurer commented Jan 31, 2014

@jjongsma which version of netty you re using ?

@jjongsma

This comment has been minimized.

Show comment
Hide comment
@jjongsma

jjongsma Jan 31, 2014

4.0.14.Final. I didn't have a simple test case that would let me test later versions without updating a ton of dependencies. If there have been changes in .15 or .16 you think may have fixed this, let me know and I'll try to verify with a simple case.

jjongsma commented Jan 31, 2014

4.0.14.Final. I didn't have a simple test case that would let me test later versions without updating a ton of dependencies. If there have been changes in .15 or .16 you think may have fixed this, let me know and I'll try to verify with a simple case.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 1, 2014

Member

@jjongsma no I don't think there are any chances that would affect this. I need to think about how to write a test case for this

Member

normanmaurer commented Feb 1, 2014

@jjongsma no I don't think there are any chances that would affect this. I need to think about how to write a test case for this

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 3, 2014

Member

@jjongsma I'm a bit unsure how this could happen. Can you show me your ChannelPipeline setup ? And also the code which acts on the WebSocketClientHandshaker ?

Member

normanmaurer commented Feb 3, 2014

@jjongsma I'm a bit unsure how this could happen. Can you show me your ChannelPipeline setup ? And also the code which acts on the WebSocketClientHandshaker ?

@jjongsma

This comment has been minimized.

Show comment
Hide comment
@jjongsma

jjongsma Feb 3, 2014

The production code is a little sprawling so I'm trying to put together a simple test case. Debugging clearly shows two calls to HttpClientCodec.decode() before the websocket handshaker sees the first one, but it's not happening with a simpler case so far.

When you say you're unsure how this could happen, do you mean that the architecture of Netty should prevent the flow described above? Is a second message allowed to enter a channel pipeline before a previous one exits?

jjongsma commented Feb 3, 2014

The production code is a little sprawling so I'm trying to put together a simple test case. Debugging clearly shows two calls to HttpClientCodec.decode() before the websocket handshaker sees the first one, but it's not happening with a simpler case so far.

When you say you're unsure how this could happen, do you mean that the architecture of Netty should prevent the flow described above? Is a second message allowed to enter a channel pipeline before a previous one exits?

@jjongsma

This comment has been minimized.

Show comment
Hide comment
@jjongsma

jjongsma Feb 3, 2014

@normanmaurer Looks like it is a bug in how HttpClientCodec handles extra content in the buffer after decoding an Upgrade response. It tries to do the right thing, but the outer loop in ReplayingDecoder blows up anyways when the output buffer size doesn't change.

Took me awhile to reproduce via a test case because the first request after the server starts almost never fails even though subsequent requests almost always do.

Test case / steps to reproduce: https://gist.github.com/jjongsma/8792858

jjongsma commented Feb 3, 2014

@normanmaurer Looks like it is a bug in how HttpClientCodec handles extra content in the buffer after decoding an Upgrade response. It tries to do the right thing, but the outer loop in ReplayingDecoder blows up anyways when the output buffer size doesn't change.

Took me awhile to reproduce via a test case because the first request after the server starts almost never fails even though subsequent requests almost always do.

Test case / steps to reproduce: https://gist.github.com/jjongsma/8792858

jjongsma pushed a commit to barchart/barchart-netty4 that referenced this issue Feb 3, 2014

@npordash

This comment has been minimized.

Show comment
Hide comment
@npordash

npordash Feb 4, 2014

I can confirm this behavior as well. I'm currently using 4.0.12.Final with success but if I try to upgrade to 4.0.14/15 exceptions get thrown in the way @jjongsma is describing.

It is actually pretty easy to reproduce using the WebSocketClient/Server example in the project.

In WebSocketServerHandler.handleHttpRequest just sync on the completion of the handshake and then immediately write and flush a frame.

Before:

if (handshaker == null) {
  WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse(ctx.channel());
} else {
  handshaker.handshake(ctx.channel(), req);
}

After:

if (handshaker == null) {
  WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse(ctx.channel());
} else {
  handshaker.handshake(ctx.channel(), req).syncUninterruptibly();
  ctx.channel().writeAndFlush(new PingWebSocketFrame());
}

Run the client a few times against the server (since it may not happen every time) and you'll eventually see it.

npordash commented Feb 4, 2014

I can confirm this behavior as well. I'm currently using 4.0.12.Final with success but if I try to upgrade to 4.0.14/15 exceptions get thrown in the way @jjongsma is describing.

It is actually pretty easy to reproduce using the WebSocketClient/Server example in the project.

In WebSocketServerHandler.handleHttpRequest just sync on the completion of the handshake and then immediately write and flush a frame.

Before:

if (handshaker == null) {
  WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse(ctx.channel());
} else {
  handshaker.handshake(ctx.channel(), req);
}

After:

if (handshaker == null) {
  WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse(ctx.channel());
} else {
  handshaker.handshake(ctx.channel(), req).syncUninterruptibly();
  ctx.channel().writeAndFlush(new PingWebSocketFrame());
}

Run the client a few times against the server (since it may not happen every time) and you'll eventually see it.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 5, 2014

Member

@trustin could you take a look. I think this regression slipped in because of your change how 101 is handled

Member

normanmaurer commented Feb 5, 2014

@trustin could you take a look. I think this regression slipped in because of your change how 101 is handled

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 6, 2014

Member

@npordash I can't reproduce it so far even with your change. But I think I know how to fix the bug. Could you test a fix for me once I'm done ?

Member

normanmaurer commented Feb 6, 2014

@npordash I can't reproduce it so far even with your change. But I think I know how to fix the bug. Could you test a fix for me once I'm done ?

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 6, 2014

Member

@npordash nevermind... I tested with my fix in place, without it I see the problem. So let me cleanup stuff and push the fix :)

Member

normanmaurer commented Feb 6, 2014

@npordash nevermind... I tested with my fix in place, without it I see the problem. So let me cleanup stuff and push the fix :)

normanmaurer pushed a commit that referenced this issue Feb 6, 2014

Norman Maurer
[#2173] Fix regression that let HttpRequestDecoder fail if the websoc…
…ket response and a websocketframe are send in one go
@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 6, 2014

Member

@jjongsma @npordash fixed and test-cases added. Could you guys test with 4.0.16.Final-SNAPSHOT ?

Member

normanmaurer commented Feb 6, 2014

@jjongsma @npordash fixed and test-cases added. Could you guys test with 4.0.16.Final-SNAPSHOT ?

@normanmaurer normanmaurer added the defect label Feb 6, 2014

@normanmaurer normanmaurer self-assigned this Feb 6, 2014

normanmaurer pushed a commit that referenced this issue Feb 6, 2014

Norman Maurer
[#2173] Fix regression that let HttpRequestDecoder fail if the websoc…
…ket response and a websocketframe are send in one go
@trustin

This comment has been minimized.

Show comment
Hide comment
@trustin

trustin Feb 6, 2014

Member

Thanks @normanmaurer for taking care of this.

Member

trustin commented Feb 6, 2014

Thanks @normanmaurer for taking care of this.

@jjongsma

This comment has been minimized.

Show comment
Hide comment
@jjongsma

jjongsma Feb 6, 2014

Confirmed fixed for my test case.

jjongsma commented Feb 6, 2014

Confirmed fixed for my test case.

@normanmaurer

This comment has been minimized.

Show comment
Hide comment
@normanmaurer

normanmaurer Feb 6, 2014

Member

@jjongsma thanks for verify!

Member

normanmaurer commented Feb 6, 2014

@jjongsma thanks for verify!

@npordash

This comment has been minimized.

Show comment
Hide comment
@npordash

npordash Feb 6, 2014

@normanmaurer Thanks Norman! Verified on my end as well :)

npordash commented Feb 6, 2014

@normanmaurer Thanks Norman! Verified on my end as well :)

trustin added a commit that referenced this issue Jul 28, 2014

Fix a bug where the data that follows an HTTP upgrade response is lost
Related issue: #2179 and #2173

Motivation:

When a WebSocket server responds to a WebSocket handshake request with
its first WebSocket frame, it is sometimes swallowed by
HttpMessageDecoder.  This issue has been fixed by
4f6ccbb in 4 and 5, but was not.

Modification:

Backport the UPGRADED state and the fix for #2173 to HttpMessageDecoder

Result:

The data sent by the server following an HTTP upgrade response is not
lost anymore.
@xmulzz

This comment has been minimized.

Show comment
Hide comment
@xmulzz

xmulzz Oct 3, 2014

I'm still having this kind of issue when I use 5.0.0.Alpha1. Could anyone do me a favor look at this?

io.netty.handler.codec.DecoderException: HttpClientCodec$Decoder.decode() must consume the inbound data or change its state if it did not decode anything.
at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:374)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:140)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:74)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:138)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:127)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:794)
at java.lang.Thread.run(Unknown Source)

xmulzz commented Oct 3, 2014

I'm still having this kind of issue when I use 5.0.0.Alpha1. Could anyone do me a favor look at this?

io.netty.handler.codec.DecoderException: HttpClientCodec$Decoder.decode() must consume the inbound data or change its state if it did not decode anything.
at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:374)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:140)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:74)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:138)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:127)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:794)
at java.lang.Thread.run(Unknown Source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment