Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes http4s#2744 and http4s#2192 #359

Merged
merged 1 commit into from Nov 26, 2019
Merged

Conversation

sholokhov
Copy link
Contributor

Hi!

Recently we've faced an issue on our production env with http4s client - #647.

I finally got some time to investigate it a bit more and found that the problem was lying in org.http4s.blaze.pipeline.stages.SSLStage#sslHandshake#sslHandshakeLoop in this arm:

case HandshakeStatus.NEED_WRAP =>
    val o = getScratchBuffer(maxBuffer)
    val r = engine.wrap(emptyBuffer, o)
    logger.trace(s"SSL Handshake Status after wrap: $r")
    o.flip()

    if (r.bytesProduced < 1 && r.getHandshakeStatus != HandshakeStatus.FINISHED)
        handshakeFailure(new SSLException(s"SSL Handshake WRAP produced 0 bytes: $r"))

    channelWrite(copyBuffer(o)).onComplete {
        case Success(_) => sslHandshake(data, r.getHandshakeStatus)
        case Failure(t) => handshakeFailure(t)
    }(serialExec)

There was already an attempt to fix it here #258, but the main issue is that after handshakeFailure being called, next channelWrite(...) call sometimes completes with Success arm which leads to infinite recursion.

After spending some time wondering how is it possible (bytesProduced = 0, which should mean connection is closed, but channelWrite reports successful status), I found this JDK regression - https://bugs.openjdk.java.net/browse/JDK-8220703 with the following comment:

The issue for me is that the ssl engine will stay in the state NEED_WRAP, 
when inbound is closed but outbound is not, which is not very intuitive.

The javadoc says for the NEED_WRAP state that: "The SSLEngine must send data 
to the remote side before handshaking can continue, so SSLEngine.wrap() should be 
called." So I call wrap() without application data, but this will make no progress on 
the ssl engine, the state is still NEED_WRAP. What we need to do is wrap application 
data and call outbound close.

It indeed looks like a regression in JDK.

I ran some tests with this fix on our envs and there are some JVM stats (OpenJDK 11, fix was deployed at ~18.30):

Screenshot 2019-11-25 at 10 38 11 PM

According to logs (not included in this PR) it catches the case with infinite recursions. I also haven't spot any resource/memory leakages, so this workaround potentially fixes http4s/http4s#2192 and #647.

Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I will publish a release.

@rossabaker rossabaker merged commit 8b05f4c into http4s:master Nov 26, 2019
@satorg
Copy link
Contributor

satorg commented Mar 24, 2020

@sholokhov it seems the issue is still here even with your fix applied.
I keep encountering this error from time to time when I hitting some HTTPS endpoint:

[info] javax.net.ssl.SSLException: SSL Handshake WRAP produced 0 bytes: Status = OK HandshakeStatus = NEED_WRAP
[info] bytesConsumed = 0 bytesProduced = 0
[info] 	at org.http4s.blaze.pipeline.stages.SSLStage.sslHandshakeLoop$1(SSLStage.scala:205)
[info] 	at org.http4s.blaze.pipeline.stages.SSLStage.sslHandshake(SSLStage.scala:231)
[info] 	at org.http4s.blaze.pipeline.stages.SSLStage.org$http4s$blaze$pipeline$stages$SSLStage$$doRead(SSLStage.scala:141)

Can it be something wrong on my side?

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

scala: 2.13.1
http4s: 0.21.1

@sholokhov
Copy link
Contributor Author

Hi @satorg!

We've never seen this issue anymore after that fix, but we're still running Scala 2.12.10 on OpenJDK 11.0.5 and http4s 0.20.14. I will try to upgrade at least http4s to the latest version and see if I can catch this again.

@satorg
Copy link
Contributor

satorg commented Mar 30, 2020

@sholokhov so what I found out in the last couple of day:

  1. It is not related to Scala version, I tried both 2.12 and 2.13, it doesn't make any difference.
  2. It is related somehow to a particular server. I do see these errors when I hit our GitHub Enterprise APIs, but I cannot reproduce it with the public GitHub APIs (api.github.com).
  3. On the other hand, it is also related to a JDK version. I can only reproduce it with OpenJDK 11 (I tried both 11.0.2 and 11.0.6), but I cannot reproduce it with the more recent OpenJDK 13.0.2.

Unfortunately, I have to deal with our GitHub Enterprise and there's only JDK 11 available in the company's cluster.

@sholokhov
Copy link
Contributor Author

@satorg I’ve managed to migrate our services on latest http4s version, but don’t see any issues so far. Can you may be enable trace logs for blaze and try to catch it once again? Would be nice if you can attach it here so we can get better picture of what’s going on there.

@satorg
Copy link
Contributor

satorg commented Apr 1, 2020

@sholokhov yes, for sure, here it is:
https://gist.github.com/satorg/a2b14cb2f9ae900fce5feb473e40e031#file-194058-log
There's also a test app I wrote which collects these logs:
https://gist.github.com/satorg/a2b14cb2f9ae900fce5feb473e40e031#file-testapp-scala

There're a lot of exceptions like

19:40:59.502 DEBUG [scala-execution-context-global-33] o.http4s.blaze.pipeline.stages.SSLStage: Error while closing SSL Engine
javax.net.ssl.SSLException: closing inbound before receiving peer's close_notify

but as I understand from the sources these can be safely ignored.
The exceptions which matter look like

19:41:15.278 ERROR [scala-execution-context-global-31] org.http4s.client.blaze.Http1Connection: Fatal Error: Error during phase: Initial Read
javax.net.ssl.SSLException: SSL Handshake WRAP produced 0 bytes: Status = OK HandshakeStatus = NEED_WRAP
bytesConsumed = 0 bytesProduced = 0
	at org.http4s.blaze.pipeline.stages.SSLStage.sslHandshakeLoop$1(SSLStage.scala:205)

BTW, it seems I found a workaround to the issue. At least it works for me when I have to use Java11.
In a nutshell, it seems Java11 has some issues with TLSv1.3.
So if I create a custom SSLContext and thus enforce TLSv1.2 (there's an example in the test app)
these errors don't show up.
Neither these errors appear if I run the test app on Java13 – it works perfectly well with both default and custom SSLContext with either of TLS versions enforced.
So it seems Java13 has these TLS issues fixed.

I found an article which gave me a hint (although I'm not completely sure that it is the same issue):
https://webtide.com/openjdk-11-and-tls-1-3-issues/

@rossabaker
Copy link
Member

It sounds like this is distinct from #376, but I wonder if that TLS1.3 implementation is to blame there, as well. We should start testing on the latest JDKs.

@matthughes
Copy link

I'm getting this as well with OpenJDK 11.0.7 and http4s 0.21.3. I was on 11.0.6 but had to upgrade because of a different SSL bug: https://bugs.openjdk.java.net/browse/JDK-8236039 fixed in 11.0.7.

I do find this bug seems a bit sporadic. My guess is that I don't get this when reusing an HTTP connection as subsequent requests don't fail in the same way.

server {"@timestamp":"2020-04-21T10:46:59.657-04:00","@version":"1","message":"Received Request:\n","logger_name":"org.http4s.server.blaze.Http1ServerStage","thread_name":"scala-execution-context-global-73","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.657-04:00","@version":"1","message":"SSL Read Request Status: Status = BUFFER_UNDERFLOW HandshakeStatus = NOT_HANDSHAKING\nbytesConsumed = 0 bytesProduced = 0, java.nio.HeapByteBuffer[pos=0 lim=16709 cap=16709]","logger_name":"org.http4s.blaze.pipeline.stages.SSLStage","thread_name":"scala-execution-context-global-73","level":"DEBUG","level_value":10000}
server {"@timestamp":"2020-04-21T10:46:59.657-04:00","@version":"1","message":"NIOHeadStage received a read request of size -1","logger_name":"org.http4s.blaze.channel.nio1.NIO1HeadStage","thread_name":"scala-execution-context-global-73","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.657-04:00","@version":"1","message":"1001453905: doRead completed in 72982ns","logger_name":"org.http4s.blaze.pipeline.stages.SSLStage","thread_name":"scala-execution-context-global-73","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.658-04:00","@version":"1","message":"Error rendering request","logger_name":"org.http4s.client.blaze.Http1Connection","thread_name":"ioapp-compute-6","level":"ERROR","level_value":40000,"stack_trace":"javax.net.ssl.SSLException: SSL Handshake WRAP produced 0 bytes: Status = OK HandshakeStatus = NEED_WRAP\nbytesConsumed = 0 bytesProduced = 0\n\tat org.http4s.blaze.pipeline.stages.SSLStage.sslHandshakeLoop$1(SSLStage.scala:205)\n\tat org.http4s.blaze.pipeline.stages.SSLStage.sslHandshake(SSLStage.scala:231)\n\tat org.http4s.blaze.pipeline.stages.SSLStage.org$http4s$blaze$pipeline$stages$SSLStage$$doRead(SSLStage.scala:141)\n\tat org.http4s.blaze.pipeline.stages.SSLStage.$anonfun$doRead$1(SSLStage.scala:133)\n\tat org.http4s.blaze.pipeline.stages.SSLStage.$anonfun$doRead$1$adapted(SSLStage.scala:130)\n\tat scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)\n\tat org.http4s.blaze.util.SerialExecutionContext$$anon$1.act(SerialExecutionContext.scala:18)\n\tat org.http4s.blaze.util.SerialExecutionContext$$anon$1.act(SerialExecutionContext.scala:17)\n\tat org.http4s.blaze.util.Actor$RecycleableRunnable.go$1(Actor.scala:57)\n\tat org.http4s.blaze.util.Actor$RecycleableRunnable.run(Actor.scala:72)\n\tat org.http4s.blaze.util.Execution$$anon$3.execute(Execution.scala:65)\n\tat org.http4s.blaze.util.Actor.$bang(Actor.scala:45)\n\tat org.http4s.blaze.util.SerialExecutionContext.execute(SerialExecutionContext.scala:24)\n\tat scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)\n\tat scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)\n\tat scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)\n\tat scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)\n\tat scala.concurrent.Promise.complete(Promise.scala:53)\n\tat scala.concurrent.Promise.complete$(Promise.scala:52)\n\tat scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)\n\tat scala.concurrent.Promise.success(Promise.scala:86)\n\tat scala.concurrent.Promise.success$(Promise.scala:86)\n\tat scala.concurrent.impl.Promise$DefaultPromise.success(Promise.scala:187)\n\tat org.http4s.blaze.channel.nio2.ByteBufferHead$$anon$2.completed(ByteBufferHead.scala:91)\n\tat org.http4s.blaze.channel.nio2.ByteBufferHead$$anon$2.completed(ByteBufferHead.scala:73)\n\tat java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)\n\tat java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:439)\n\tat java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)\n\tat java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)\n\tat java.base/sun.nio.ch.KQueuePort$EventHandlerTask.run(KQueuePort.java:312)\n\tat java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"}
server {"@timestamp":"2020-04-21T10:46:59.780-04:00","@version":"1","message":"GZip middleware encoding content","logger_name":"org.http4s.server.middleware.GZip","thread_name":"ioapp-compute-0","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.781-04:00","@version":"1","message":"Found Keep-Alive header","logger_name":"org.http4s.server.blaze.Http1ServerStage","thread_name":"ioapp-compute-0","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.781-04:00","@version":"1","message":"Using Caching Chunk Encoder","logger_name":"org.http4s.server.blaze.Http1ServerStage","thread_name":"ioapp-compute-0","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.782-04:00","@version":"1","message":"NIO1HeadStage Write Request: ArrayBuffer(java.nio.HeapByteBuffer[pos=0 lim=1529 cap=1529])","logger_name":"org.http4s.blaze.channel.nio1.NIO1HeadStage","thread_name":"scala-execution-context-global-73","level":"TRACE","level_value":5000}
server {"@timestamp":"2020-04-21T10:46:59.782-04:00","@version":"1","message":"265004907: continueWrite completed in 227929ns","logger_name":"org.http4s.blaze.pipeline.stages.SSLStage","thread_name":"scala-execution-context-global-73","level":"TRACE","level_value":5000}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: SSL Handshake WRAP produced 0 bytes.
5 participants