Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL_read failed: OpenSSL error #6481

Closed
rkapsi opened this issue Mar 1, 2017 · 26 comments
Closed

SSL_read failed: OpenSSL error #6481

rkapsi opened this issue Mar 1, 2017 · 26 comments
Assignees
Labels
Milestone

Comments

@rkapsi
Copy link
Member

rkapsi commented Mar 1, 2017

Netty 4.1.9-SNAPSHOT with netty-tcnative 2.0.0.Beta6

I'll try to provide a repro but I'm seeing the following error in our application when I upgrade to Netty 4.1.9-SNAPSHOT and netty-tcnative 2.0.0.Beta6.

From what I can tell, the SSL hanshake completes, I receive the HTTP request, I respond with ctx.writeAndFlush(FullHttpResponse), I see the response data in curl but it cuts off randomly and both ends report the following errors:

// Server: Netty w/ OpenSSL on port 8443
ReferenceCountedOpenSslEngine DEBUG: SSL_read failed: OpenSSL error: error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
// Client: curl -k https://localhost:8443
curl: (56) SSL read: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number, errno 0

An another observation is that curl seems to be a bit more stable. Some requests do succeed but Browsers (I tested Chrome and FF) on the other hand fail pretty reliably (every time). The problem goes away as soon as I switch back to Netty 4.1.8.

This resembles what we observed in the #6466 ticket (minus the Exception).

@Scottmitch
Copy link
Member

Scottmitch commented Mar 2, 2017

Netty 4.1.9-SNAPSHOT

What commit hash are you using?
What version of OpenSSL?
Is there anything interesting about the payload/scenario that you thing may help reproduce? For example do you have to repeat the request many times, pipelining, large headers, large response, multiple connections, are you using sni (or other tls extensions), etc..?

@Scottmitch
Copy link
Member

can you reproduce with a modified version of the http hello world example?

@Scottmitch
Copy link
Member

Scottmitch commented Mar 2, 2017

can you also include some verbose output from curl -iv?

Can you reproduce with openssl s_client -host localhost -port 8443 (then enter GET / HTTP/1.1 after the handshake)?

@normanmaurer
Copy link
Member

@rkapsi also a big thank you for trying the SNAPSHOTS and provide feedback!

@Scottmitch Scottmitch self-assigned this Mar 2, 2017
@Scottmitch
Copy link
Member

Also do you try to enable SSL_MODE_ENABLE_PARTIAL_WRITE for Netty?

@rkapsi
Copy link
Member Author

rkapsi commented Mar 2, 2017

Still trying to come up with a reproducer but I've narrowed it down to a commit.

WORKS: 66b17310418230d12c21f1f91287c22022d8d847 w/ 1.1.33.Fork26 (February 1, 2017)
WORKS: 1d128c7a6594e1ffd986fee9723c4305082e23cf w/ 2.0.0.Beta1 (February 2, 2017)
WORKS: 6353c229fd11c6e6306011d55993775f7685d122 w/ 2.0.0.Beta1 (February 7, 2017)
WORKS: 007048dddda0eb922952f8b07ea2521d9a5eadef w/ 2.0.0.Beta1 (February 9, 2017)
FAILS: d06990f43458e9ac7cea803269e83d072ada61f6 w/ 2.0.0.Final-SNAPSHOT* (February 9, 2017)
FAILS: 9c03d49f144f15b796294f2221c85cfe1a3c73a6 w/ 2.0.0.Beta2 (February 10, 2017)

* missing artifact, used 2.0.0.Beta2 (it doesn't compile with Beta1)

So it's either d06990f or 2.0.0.Beta2.

I do not enable SSL_MODE_ENABLE_PARTIAL_WRITE. Is that something new?

Other random blurbs: I'm working on something like a HTTP cache that uses CompositeByteBufs to aggregate the HttpContent's bytes. The first curl request succeeds (passing thru the HttpContent bytes, aggregation happens on the side) and subsequent requests fail (this time returning the aggregated data). If I take out the caching layer then curl appears to be seemingly fine but Google Chrome and FF are still failing. It seems it has something to do with "timing" and/or the size of the ByteBuf that are being passed around.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 2, 2017

This is how it looks in curl:

head

tail

@johnou
Copy link
Contributor

johnou commented Mar 2, 2017

@rkapsi pro tip: you can drag drop images straight into the comment input.

@Scottmitch
Copy link
Member

@rkapsi - Thanks for the info. d06990f doesn't surprise me as there were big changes to tcnative and Netty's OpenSslEngine. There were also some followup commits to fix things on both sides, but if you are seeing the same behavior with this older code that is good to know.

@Scottmitch
Copy link
Member

I do not enable SSL_MODE_ENABLE_PARTIAL_WRITE. Is that something new?

Not new ... but just trying to understand the use case. We may not properly support this flag yet (until PR #6365)

Also the image link you provided in #6481 (comment) gives a 404.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 2, 2017

@Scottmitch uploaded the images straight into the ticket as per @johnou tip.

@johnou
Copy link
Contributor

johnou commented Mar 3, 2017

@Scottmitch is #6488 related?

@Scottmitch
Copy link
Member

@johnou - Not in this specific case no. I plan to coordinate more with @rkapsi tomorrow and resolve this asap.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 3, 2017

Managed to reproduce it by constructing a CompositeByteBuf the way my custom HTTP response aggregation would do it.

import javax.net.ssl.SSLEngine;

import io.netty.bootstrap.ServerBootstrap;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.ByteBufAllocator;
import io.netty.buffer.CompositeByteBuf;
import io.netty.channel.Channel;
import io.netty.channel.ChannelFutureListener;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInboundHandlerAdapter;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelOption;
import io.netty.channel.ChannelPipeline;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioServerSocketChannel;
import io.netty.handler.codec.http.DefaultFullHttpResponse;
import io.netty.handler.codec.http.FullHttpResponse;
import io.netty.handler.codec.http.HttpHeaderNames;
import io.netty.handler.codec.http.HttpHeaderValues;
import io.netty.handler.codec.http.HttpHeaders;
import io.netty.handler.codec.http.HttpRequest;
import io.netty.handler.codec.http.HttpResponseStatus;
import io.netty.handler.codec.http.HttpServerCodec;
import io.netty.handler.codec.http.HttpVersion;
import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;
import io.netty.handler.ssl.SslHandler;
import io.netty.handler.ssl.SslProvider;
import io.netty.handler.ssl.util.SelfSignedCertificate;
import io.netty.util.ReferenceCountUtil;

public class Issue6481 {

  private static final int PORT = 9443;
  
  public static void main(String[] args) throws Exception {
    
    EventLoopGroup group = new NioEventLoopGroup();
    
    Channel channel = newHttpsServer(group, PORT);
    
    System.out.println("\n>>> Try 'curl -v -k https://localhost:" + PORT + "' <<<\n");
  }
  
  private static Channel newHttpsServer(EventLoopGroup group, int port) throws Exception {
    
    SelfSignedCertificate cert = new SelfSignedCertificate();
    
    SslContext context = SslContextBuilder.forServer(cert.certificate(), cert.privateKey())
        .sslProvider(SslProvider.OPENSSL)
        .build();
    
    ServerBootstrap bootstrap = new ServerBootstrap()
        .channel(NioServerSocketChannel.class)
        .group(group)
        .option(ChannelOption.SO_REUSEADDR, true)
        .childHandler(new HttpsServer(context));

    return bootstrap.bind(PORT)
        .syncUninterruptibly()
        .channel();
  }

  private static class HttpsServer extends ChannelInitializer<Channel> {

    private final SslContext context;

    public HttpsServer(SslContext context) {
      this.context = context;
    }

    @Override
    protected void initChannel(Channel ch) throws Exception {
      ChannelPipeline pipeline = ch.pipeline();

      SSLEngine engine = context.newEngine(ch.alloc());
      pipeline.addLast(new SslHandler(engine));
      pipeline.addLast(new HttpServerCodec());
      
      pipeline.addLast(new HttpRequestHandler());
    }

    private static class HttpRequestHandler extends ChannelInboundHandlerAdapter {
      
      private static final FullHttpResponse RESPONSE = newResponse(ByteBufAllocator.DEFAULT);
      
      static {
        System.out.println(">>> RESPONSE: " + RESPONSE + "\n");
        
        CompositeByteBuf content = (CompositeByteBuf)RESPONSE.content();
        content.forEach((buf) -> {
          System.out.println(">>> COMPONENT: " + buf + "\n");
        });
      }
      
      @Override
      public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        ReferenceCountUtil.release(msg);

        if (msg instanceof HttpRequest) {
          FullHttpResponse response = RESPONSE.retainedDuplicate();
          System.out.println(">>> response: " + response + "\n");
          
          ctx.writeAndFlush(response)
              .addListener(ChannelFutureListener.CLOSE);
        }
      }
      
      private static FullHttpResponse newResponse(ByteBufAllocator alloc) {
        ByteBuf input = alloc.buffer();
        input.writeBytes(new byte[40279]);
        
        CompositeByteBuf content = alloc.compositeBuffer();
        content.addComponent(true, input.readRetainedSlice(469));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(1024));
        content.addComponent(true, input.readRetainedSlice(352));
        content.addComponent(true, input.readRetainedSlice(16384));
        content.addComponent(true, input.readRetainedSlice(8896));
        content.addComponent(true, input.readRetainedSlice(2914));
        
        // Make sure we've consumed all the bytes
        if (content.readableBytes() != input.readerIndex()) {
          throw new IllegalStateException(content.readableBytes() + " vs. " + input.readerIndex());
        }
        
        // This shouldn't release the input
        if (input.release()) {
          throw new IllegalStateException();
        }
        
        FullHttpResponse response = new DefaultFullHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK, content);
        HttpHeaders headers = response.headers();
        
        headers.set(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.APPLICATION_OCTET_STREAM);
        headers.set(HttpHeaderNames.CONTENT_LENGTH, content.readableBytes());
        headers.set(HttpHeaderNames.CONNECTION, HttpHeaderValues.CLOSE);
        
        return response;
      }
    }
  }
}
$ curl -v -k https://localhost:9443
* Rebuilt URL to: https://localhost:9443/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=example.com
*  start date: Mar  3 14:00:18 2016 GMT
*  expire date: Dec 31 23:59:59 9999 GMT
*  issuer: CN=example.com
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
> GET / HTTP/1.1
> Host: localhost:9443
> User-Agent: curl/7.52.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/octet-stream
< content-length: 40279
< connection: close
< 
* TLSv1.2 (OUT), TLS alert, Server hello (2):
* SSL read: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number, errno 0
* Curl_http_done: called premature == 1
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, Client hello (1):
curl: (56) SSL read: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number, errno 0

@normanmaurer
Copy link
Member

@rkapsi thanks!

@johnou
Copy link
Contributor

johnou commented Mar 3, 2017

@rkapsi you rock.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 3, 2017

Running this code against 4.1.8 and 1.1.33.Fork26 or 007048d and 2.0.0.Beta1 works.

@normanmaurer
Copy link
Member

@rkapsi looking atm... this has for sure something to do with the CompositeByteBuf handling.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 3, 2017

@normanmaurer 👍

That repro is representing the "HTTP response cache" scenario (doing my own CompositeByteBuf stuff) I was mentioning earlier in the thread. But our Netty server is also a client that is fetching data from an upstream server (using HTTP) and HttpObjectDecoder or rather ByteToMessageDecoder uses internally CompositeByteBufs for cumulation. Whether cumulation kicks in in that scenario is highly depend on the consumer of these bytes and that why curl seemed to work but Browsers would fail (due to different buffers etc).

@Scottmitch
Copy link
Member

thanks @rkapsi ... looking now.

@Scottmitch
Copy link
Member

@rkapsi - Can you verify PR #6492?

@rkapsi
Copy link
Member Author

rkapsi commented Mar 3, 2017

@Scottmitch unfortunately negative. The reproducer is succeeding but our server is still failing. In particular when I use a Browser.

@rkapsi
Copy link
Member Author

rkapsi commented Mar 3, 2017

@Scottmitch it's working when I disable H2. I guess that is the difference between using curl and Browser.

CompositeByteBuf inside CompositeByteBuf? I recall Netty's H2 code doing its own round of aggregation.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Mar 4, 2017
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes netty#6481
@rkapsi
Copy link
Member Author

rkapsi commented Mar 6, 2017

@Scottmitch commit 677f5e2 appears to work. My tests on Friday were for 5b57fa0508babff04afccc82bdb5cef916b22270.

@johnou
Copy link
Contributor

johnou commented Mar 6, 2017

🎉

@Scottmitch
Copy link
Member

@rkapsi - Thanks for verifying! Sorry about the multiple commits ... had to clean some stuff up along the way.

Scottmitch added a commit that referenced this issue Mar 6, 2017
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes #6481
asajim pushed a commit to delftswa2017/netty that referenced this issue Mar 20, 2017
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes netty#6481
liuzhengyang pushed a commit to liuzhengyang/netty that referenced this issue Sep 10, 2017
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes netty#6481
kiril-me pushed a commit to kiril-me/netty that referenced this issue Feb 28, 2018
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes netty#6481
pulllock pushed a commit to pulllock/netty that referenced this issue Oct 19, 2023
…r size

Motivation:
When we do a wrap operation we calculate the maximum size of the destination buffer ahead of time, and return a BUFFER_OVERFLOW exception if the destination buffer is not big enough. However if there is a CompositeByteBuf the wrap operation may consist of multiple ByteBuffers and each incurs its own overhead during the encryption. We currently don't account for the overhead required for encryption if there are multiple ByteBuffers and we assume the overhead will only apply once to the entire input size. If there is not enough room to write an entire encrypted packed into the BIO SSL_write will return -1 despite having actually written content to the BIO. We then attempt to retry the write with a bigger buffer, but because SSL_write is stateful the remaining bytes from the previous operation are put into the BIO. This results in sending the second half of the encrypted data being sent to the peer which is not of proper format and the peer will be confused and ultimately not get the expected data (which may result in a fatal error). In this case because SSL_write returns -1 we have no way to know how many bytes were actually consumed and so the best we can do is ensure that we always allocate a destination buffer with enough space so we are guaranteed to complete the write operation synchronously.

Modifications:
- SslHandler#allocateNetBuf should take into account how many ByteBuffers will be wrapped and apply the encryption overhead for each
- Include the TLS header length in the overhead computation

Result:
Fixes netty#6481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants