Micronaut 2: Http client read timeouts errors #3651

dstepanov · 2020-07-06T10:41:10Z

Not sure exactly what is going on but getting a lot of those when deployed. I saw there was some issue regarding HTTP2 but this the same code as in 1.3. I have tried different stuff but cannot reproduce locally.

i.m.h.c.e.ReadTimeoutException: Read Timeout
	at i.m.h.c.e.ReadTimeoutException.<clinit>(ReadTimeoutException.java:26)
	at i.m.h.c.n.DefaultHttpClient.lambda$null$35(DefaultHttpClient.java:1152)
	at i.r.i.o.f.FlowableOnErrorNext$OnErrorNextSubscriber.onError(FlowableOnErrorNext.java:103)
	at i.m.r.r.RxInstrumentedSubscriber.onError(RxInstrumentedSubscriber.java:66)
	at i.r.i.o.f.FlowableTimeoutTimed$TimeoutSubscriber.onTimeout(FlowableTimeoutTimed.java:139)
	at i.r.i.o.f.FlowableTimeoutTimed$TimeoutTask.run(FlowableTimeoutTimed.java:170)
	at i.r.i.s.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at i.r.i.s.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at j.u.c.FutureTask.run(FutureTask.java:264)
	at j.u.c.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(Thread.java:834)

The text was updated successfully, but these errors were encountered:

graemerocher · 2020-07-06T10:54:34Z

Ok, little bit hard to provide a solution without an example that reproduces the issue 😢

marianozunino · 2020-07-07T04:37:33Z

Having the same issue. It only happens when I the app runs in a pod.

Project that is causing the problem: https://github.com/forbi/slow-mn

graemerocher · 2020-07-07T06:32:17Z

@FORBI do you have steps to reproduce for that app?

dstepanov · 2020-07-07T08:07:33Z

@graemerocher Is it possible this commit 1f370c4 caused something like this?

graemerocher · 2020-07-07T09:10:42Z

@dstepanov Could be. If we had a way to reproduce the issue then we could try do a git bisect or revert the commit and see if it is the cause

marianozunino · 2020-07-07T13:05:35Z

@FORBI do you have steps to reproduce for that app?

Deploy the app to k8s and hit the endpoint between 5-10 times.

dstepanov · 2020-07-08T02:55:29Z

@graemerocher With enabled logging, I see log full of messages:

Discarded inbound message EmptyLastHttpContent that reached at the tail of the pipeline. Please check your pipeline configuration.
Discarded message pipeline : [LoggingHandler#0, idle-state, http-server-codec, flow-control-handler, http-keep-alive-handler, http-compressor, http-decompressor, http-streams-codec, chunk-writer, micronaut-http-decoder, micronaut-http-encoder, websocket-upgrade-handler, micronaut-inbound-handler, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0x4229cf2d, L:/100.96.42.204:8080 - R:/100.96.42.67:43418].

Possible solutions https://stackoverflow.com/questions/15242793/netty-pipeline-warning

dstepanov · 2020-07-08T03:11:43Z

It looks like it happens on the second request, first passes.

dstepanov · 2020-07-08T04:18:35Z

More info:

Jul 8, 2020 @ 09:48:14.496 nioEventLoopGroup-3-3 Read timeout at io.micronaut.http.client.netty.DefaultHttpClient.lambda$null$35(DefaultHttpClient.java:1152) (probably occured before CONNECT)

Jul 8, 2020 @ 09:48:14.494 nioEventLoopGroup-3-3 [id: 0x3bed0818] CONNECT: client request connected to host
Jul 8, 2020 @ 09:48:14.493 nioEventLoopGroup-3-3 Incoming HTTP Server Request

Multiple "Discarded inbound message" + "Discarded message pipeline" on nioEventLoopGroup-3-2 and nioEventLoopGroup-3-3

Jul 8, 2020 @ 09:47:13.489 nioEventLoopGroup-3-3 [id: 0x3bed0818] REGISTERED - incoming client request

multiple server + client requests

To me, it looks like pipeline error is blocking threads that are used for server and client processing.

Is it possible to have separate pools for the server and clients?

graemerocher · 2020-07-08T06:25:44Z

From the logs this looks like it is sending a file back since chunked writer is in the pipeline. That may help narrow down things down

dstepanov · 2020-07-08T06:54:16Z

It's not a file just a JSON request/response

graemerocher · 2020-07-08T08:28:58Z

You could try set micronaut.server.thread-selection: IO which runs all server requests on the IO thread pool.

You could also replace DefaultExecutorSelector with a new bean that runs on the executor service of your choice so yes it is possible to alter the behaviour. Still that doesn't get us closer to the root of the problem, I will hopefully get time soon to look at @FORBI's example

graemerocher · 2020-07-08T08:48:05Z

Unfortunately @FORBI's instructions to reproduce were not particularly helpful. His example application seems to require setup and fails with Unexpected error occurred: Failed to inject value for parameter [typeOfSrvE] of class: uy.com.antel.pi.debttlk.service.DebtFinder out of the box. Maybe if someone can provide a useful set of steps to reproduce this issue we can take a look.

dstepanov · 2020-07-08T09:21:14Z

I suspect the problem is related to how k8s (running inside AWS for us) is sending HTTP requests to the Micronaut server.
Simple hello-world with logging might produce the same Netty errors as I posted, maybe using a different HTTP client would also trigger the problem.

Unfortunately @FORBI's instructions to reproduce were not particularly helpful. His example application seems to require setup and fails with Unexpected error occurred: Failed to inject value for parameter [typeOfSrvE] of class: uy.com.antel.pi.debttlk.service.DebtFinder out of the box. Maybe if someone can provide a useful set of steps to reproduce this issue we can take a look.

I think you can delete typeOfSrvE

graemerocher · 2020-07-08T09:29:07Z

if I delete that and then hit http://localhost:8080/debttlk-mn/debttlk/foo all I see is:

<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx</center>
</body>
</html>

io.micronaut.http.client.exceptions.HttpClientResponseException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx</center>
</body>
</html>

	at io.micronaut.http.client.netty.DefaultHttpClient$11.channelRead0(DefaultHttpClient.java:2046)
	at io.micronaut.http.client.netty.DefaultHttpClient$11.channelRead0(DefaultHttpClient.java:1964)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)

But not the problem in question.

graemerocher · 2020-07-08T09:29:26Z

And yes I have tried deployment via docker and k8s locally

dstepanov · 2020-07-08T09:40:56Z

@graemerocher No netty errors when tracing is enabled?

graemerocher · 2020-07-08T11:42:15Z

What I need is a sample application with clear steps to reproduce and maybe we can make some progress

marianozunino · 2020-07-08T11:51:57Z

Hello world app with a @Client should be enough. And as I say, in my case it only happens in k8s

jameskleeh · 2020-07-08T17:39:45Z

@FORBI Please create the hello world app and test to verify the problem occurs. If it does, document the steps you took to reproduce the issue in a README.md of the project and upload it to Github. Then create a link to that project here. We can't go back and forth wasting time with projects that aren't setup correctly to reproduce the issue.

marianozunino · 2020-07-08T18:01:56Z

@jameskleeh https://github.com/forbi/demo

Edit: I added a fail.log that shows that the request returns a timeout but the request is still processed.

graemerocher · 2020-07-09T08:41:38Z

Seems a similar or related error but on 1.3.x #3694

dstepanov · 2020-07-09T09:24:40Z

@graemerocher I see the same errors in the project attached, so it looks like there is a way to reproduce it 👍

graemerocher · 2020-07-09T09:42:39Z

if it is the same problem

dstepanov · 2020-07-09T09:51:47Z

@graemerocher Read timeouts + "Discarded inbound message" I think it's the same problem

dstepanov · 2020-07-09T10:01:10Z

@graemerocher BTW "Discarded" message occurs only in Micronaut 2, I did migrate it to it

graemerocher · 2020-07-09T10:01:46Z

Right, but the read timeouts still occur regardless of that problem

graemerocher · 2020-07-09T10:42:18Z

I created a PR for the discarded message problem #3695

Regarding the problem with the read timeout in the sample application, these were caused by multiple places in the sample application where the event loop was being blocked and blocking calls were happening. I resolved them with

graemerocher/micronaut-http@5d2f072

I would suggest that in folks who are reporting this issue review their code for places where the event loop is being blocked and if it is you will need to add @ExecuteOn(TaskExecutors.IO) in Micronaut 2.0 or preferably don't block the event loop

dstepanov · 2020-07-09T11:11:57Z

@graemerocher Thanks!

I will try if it helps when it’s merged.
Not really understand how can it deadlock event loop, only case I would think is if app calls itself.

graemerocher · 2020-07-09T11:49:49Z

I don't think the issues are related because the same issue occurs on 1.3.x at least with that sample app. In Micronaut 2.x the problem is probably more likely to happen if you block the event loop since we run all operations by default on the event loop.

Having said that some people are reporting the second request failing (see #2905) which we need to investigate as well, but still waiting for a sample app for that case

graemerocher · 2020-07-09T13:00:52Z

Maybe a better reproducer https://github.com/volnei/bugreport

graemerocher · 2020-07-09T13:12:34Z

Nope, they are also doing blocking operations in the controller in that report. Application fixed with:

---
 .../main/java/gateway/WorkerController.java   |  3 ++
 .../src/test/java/gateway/GatewayTest.java    | 30 +++++++++++++++++--
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/gateway/src/main/java/gateway/WorkerController.java b/gateway/src/main/java/gateway/WorkerController.java
index d5cc164..b578643 100644
--- a/gateway/src/main/java/gateway/WorkerController.java
+++ b/gateway/src/main/java/gateway/WorkerController.java
@@ -2,12 +2,15 @@ package gateway;
 
 import io.micronaut.http.annotation.*;
 import io.micronaut.http.HttpResponse;
+import io.micronaut.scheduling.TaskExecutors;
+import io.micronaut.scheduling.annotation.ExecuteOn;
 
 import java.util.Map;
 
 import javax.inject.Inject;
 
 @Controller("/worker")
+@ExecuteOn(TaskExecutors.IO)
 public class WorkerController {
--

dstepanov · 2020-07-10T05:57:16Z

@graemerocher I think it's not a good idea to switch to the manual thread selection in version 2, developers are going to run into this kind of issues. Maybe it would be better to introduce some kind of smart auto selector which would measure the call time and decide if it should run on EL or IO executor.

graemerocher · 2020-07-10T05:59:27Z

We are considering adding blocked thread detection for the event loop.

Users can still go back to 1.x behavior by setting the thread selection strategy

Co-authored-by: James Kleeh <james.kleeh@gmail.com>

dstepanov · 2020-07-12T04:17:44Z

Switching thread selection to AUTO solves the problem, some kind of detection of blocking the event loop would be nice.

maheshloya-aegon · 2020-10-21T08:51:01Z

Where do we set thread selection to AUTO?

graemerocher · 2020-10-21T09:12:07Z

See https://docs.micronaut.io/latest/guide/configurationreference.html#io.micronaut.http.server.HttpServerConfiguration

nirvana124 · 2021-01-11T05:13:53Z

I am intermittently getting this issue when upgraded from1.3 to 2.2.2
tried with thread selection as AUTO/IO
increased timeout of http client to 20s
each controller is annotated with @ExecuteOn(TaskExecutors.IO)
I have blocking rest api calls in my service but from controller returning Single<List>

Any suggestion or workaround is appreciated.

graemerocher · 2021-01-11T08:01:22Z

@nirvana124 provide an example that reproduces the issue

amckee23 · 2021-01-29T07:46:18Z

This has just reappeared for us going from 2.03 -> 2.2.3 . I’m working on an example app but it seems to be:

regardless of thread selection
Only inside k8s
When using a mn client

amckee23 · 2021-01-29T16:57:18Z

Just as an update (incase others are as stupid as myself) This was an issue on our side. That being said it was clearly impacted by some change in 2.03 -> 2.2.3.

In our case we had a JWT validator on some of our endpoints. On our authentication microservice the JWKS url was set to itself so authenticated calls would in a servlet filter call out to the same service. This only resulted in the thread lock in a k8s environment i assume as we had strict request/limits set which resulted in a single netty event loop thread.

Hope this helps others who may be doing something similar.

dynaxis · 2021-05-18T06:08:29Z

In my case, experiencing similar problem as @amckee23 's while upgrading from 2.1.2 to 2.5.1. I set the thread-selection to AUTO, and increased the event thread pool size to 8 since our EC2 is too small, resulting in single (or two) if we don't set it explicitly. We are making blocking http calls inside controllers, sometimes multiple times in a handler. The timeout stacktrace is almost identical to the one in the top of this issue.

It would be appreciated if anybody can explain how the timeout possibly occurs with blocking use of http clients. We plan to move to uses of async APIs, but in the mean time, we need some workarounds. And I think there must be some other people hitting the similar issue. If there is a clear description of how the timeout surfaces,, it may be helpful to them.

graemerocher · 2021-05-18T06:26:46Z

To reduce context switching and improve performance Micronaut's http server and client use the same event loop in Micronaut 2+

If you block this event loop then the server has problems assigning a thread to respond hence the problem. You have a variety of options to solve this problem:

Don't block
Set thread-selection to IO and always offload operations to an IO thread pool
Configure a different event loop for the server. See https://docs.micronaut.io/latest/guide/#threadPools

dynaxis · 2021-05-18T06:37:09Z

@graemerocher thank you very much. BTW, do you think 8 event loop threads together with thread-selection AUTO can still cause read timeout in the recent versions of Micronaut? Even just yes or no would be really helpful to me. Since the default IO thread pool is unbounded, it's quite weird to me that we hit the read timeout with that many event loop threads. Our server is still quite low traffic actually running one or two handlers at the same time.

Also I appreciate your advice above. I'll try to configure a separate thread pool for server side.

*) w/ the exact same setting of thread-selection AUTO and no explicitly set event loop threads, we get no timeout on 2.1.2 and timeouts on 2.5.1.

amckee23 · 2021-05-18T09:26:51Z

Hi @dynaxis we saw something similar in 2.5.1 which was caused by: #5389. This caused the thread selection logic to be bypassed on controllers which specified a body.

It was resolved in 2.5.2 tho.

dynaxis · 2021-05-18T12:31:46Z

@amckee23 Thanks a lot. That was the very cause of the regression we were experiencing. The problem also manifests when micronaut.server.thread-selection is set to IO. We will upgrade to 2.5.3.

dejanvasic85 · 2022-09-16T00:27:50Z

We have come up against this issue recently with micronautVersion=3.5.4

The issue appears only when running in ECS Fargate (docker). We experience the issue with a graphQL endpoint which configured in application.yml:

graphql:
  enabled: true

And it only happens when the graphQL operation calls a downstream service using the Micronaut http client. After some testing, we found that it usually happens on the 5th or 6th network call out of 10 requests. The request times out (~10 seconds) and Read Timeout exception occurs.

The fix for us was setting the event loops in config:

micronaut:
  application:
    name: xxx

  netty:
    event-loops:
      custom-event-loop:
        num-threads: 20

  http:
    client:
      event-loop-group: custom-event-loop
      read-timeout: 45s

Posting, in case anyone else comes across this issue in later versions.

chriskosys · 2022-10-13T15:18:11Z

@dejanvasic85 Thanks a lot, this was extremly helpful. I got the same problem but without graphql in micronautVersion=3.7.1
after about 20 requests i got a timeout. now i added

  netty:
    event-loops:
      custom-event-loop:
        num-threads: 20

and

  http:
    client:
      event-loop-group: custom-event-loop

and the issue is gone.

graemerocher added the status: awaiting validation Waiting to be validated as a real issue label Jul 7, 2020

graemerocher added a commit that referenced this issue Jul 9, 2020

Don't fireRead if handler body publisher is gone. Issue #3651

d9070c4

graemerocher added status: pr submitted A pull request has been submitted for the issue and removed status: awaiting validation Waiting to be validated as a real issue labels Jul 9, 2020

graemerocher added this to the 2.0.1 milestone Jul 9, 2020

jameskleeh added a commit that referenced this issue Jul 10, 2020

Don't fireRead if handler body publisher is gone. Issue #3651 (#3695)

5ddf68c

Co-authored-by: James Kleeh <james.kleeh@gmail.com>

dstepanov closed this as completed Jul 12, 2020

geemanjs mentioned this issue Aug 9, 2021

Documentation should mention tracing.instrument-threads #2307

Open

gbennardo mentioned this issue May 6, 2022

Websocket connection fails (timeout) after two successfull connections #7314

Closed

Micronaut 2: Http client read timeouts errors #3651

Micronaut 2: Http client read timeouts errors #3651

Comments

dstepanov commented Jul 6, 2020

graemerocher commented Jul 6, 2020

marianozunino commented Jul 7, 2020

graemerocher commented Jul 7, 2020

dstepanov commented Jul 7, 2020

graemerocher commented Jul 7, 2020 • edited

marianozunino commented Jul 7, 2020 • edited

dstepanov commented Jul 8, 2020

dstepanov commented Jul 8, 2020

dstepanov commented Jul 8, 2020 • edited

graemerocher commented Jul 8, 2020

dstepanov commented Jul 8, 2020

graemerocher commented Jul 8, 2020

graemerocher commented Jul 8, 2020

dstepanov commented Jul 8, 2020 • edited

graemerocher commented Jul 8, 2020

graemerocher commented Jul 8, 2020

dstepanov commented Jul 8, 2020

graemerocher commented Jul 8, 2020

marianozunino commented Jul 8, 2020

jameskleeh commented Jul 8, 2020

marianozunino commented Jul 8, 2020 • edited

graemerocher commented Jul 9, 2020

dstepanov commented Jul 9, 2020

graemerocher commented Jul 9, 2020

dstepanov commented Jul 9, 2020

dstepanov commented Jul 9, 2020

graemerocher commented Jul 9, 2020

graemerocher commented Jul 9, 2020 • edited

dstepanov commented Jul 9, 2020

graemerocher commented Jul 9, 2020

graemerocher commented Jul 9, 2020

graemerocher commented Jul 9, 2020

dstepanov commented Jul 10, 2020

graemerocher commented Jul 10, 2020

dstepanov commented Jul 12, 2020

maheshloya-aegon commented Oct 21, 2020

graemerocher commented Oct 21, 2020

nirvana124 commented Jan 11, 2021

graemerocher commented Jan 11, 2021

amckee23 commented Jan 29, 2021

amckee23 commented Jan 29, 2021

dynaxis commented May 18, 2021 • edited

graemerocher commented May 18, 2021 • edited

dynaxis commented May 18, 2021 • edited

amckee23 commented May 18, 2021

dynaxis commented May 18, 2021 • edited

dejanvasic85 commented Sep 16, 2022 • edited

chriskosys commented Oct 13, 2022

graemerocher commented Jul 7, 2020 •

edited

marianozunino commented Jul 7, 2020 •

edited

dstepanov commented Jul 8, 2020 •

edited

dstepanov commented Jul 8, 2020 •

edited

marianozunino commented Jul 8, 2020 •

edited

graemerocher commented Jul 9, 2020 •

edited

dynaxis commented May 18, 2021 •

edited

graemerocher commented May 18, 2021 •

edited

dynaxis commented May 18, 2021 •

edited

dynaxis commented May 18, 2021 •

edited

dejanvasic85 commented Sep 16, 2022 •

edited