Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micronaut 2: Http client read timeouts errors #3651

Closed
dstepanov opened this issue Jul 6, 2020 · 48 comments
Closed

Micronaut 2: Http client read timeouts errors #3651

dstepanov opened this issue Jul 6, 2020 · 48 comments
Labels
status: pr submitted A pull request has been submitted for the issue
Milestone

Comments

@dstepanov
Copy link
Contributor

Not sure exactly what is going on but getting a lot of those when deployed. I saw there was some issue regarding HTTP2 but this the same code as in 1.3. I have tried different stuff but cannot reproduce locally.

i.m.h.c.e.ReadTimeoutException: Read Timeout
	at i.m.h.c.e.ReadTimeoutException.<clinit>(ReadTimeoutException.java:26)
	at i.m.h.c.n.DefaultHttpClient.lambda$null$35(DefaultHttpClient.java:1152)
	at i.r.i.o.f.FlowableOnErrorNext$OnErrorNextSubscriber.onError(FlowableOnErrorNext.java:103)
	at i.m.r.r.RxInstrumentedSubscriber.onError(RxInstrumentedSubscriber.java:66)
	at i.r.i.o.f.FlowableTimeoutTimed$TimeoutSubscriber.onTimeout(FlowableTimeoutTimed.java:139)
	at i.r.i.o.f.FlowableTimeoutTimed$TimeoutTask.run(FlowableTimeoutTimed.java:170)
	at i.r.i.s.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at i.r.i.s.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at j.u.c.FutureTask.run(FutureTask.java:264)
	at j.u.c.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(Thread.java:834)
@graemerocher
Copy link
Contributor

Ok, little bit hard to provide a solution without an example that reproduces the issue 😢

@marianozunino
Copy link

Having the same issue. It only happens when I the app runs in a pod.

Project that is causing the problem: https://github.com/forbi/slow-mn

@graemerocher
Copy link
Contributor

@FORBI do you have steps to reproduce for that app?

@dstepanov
Copy link
Contributor Author

@graemerocher Is it possible this commit 1f370c4 caused something like this?

@graemerocher
Copy link
Contributor

graemerocher commented Jul 7, 2020

@dstepanov Could be. If we had a way to reproduce the issue then we could try do a git bisect or revert the commit and see if it is the cause

@marianozunino
Copy link

marianozunino commented Jul 7, 2020

@FORBI do you have steps to reproduce for that app?

Deploy the app to k8s and hit the endpoint between 5-10 times.

@graemerocher graemerocher added the status: awaiting validation Waiting to be validated as a real issue label Jul 7, 2020
@dstepanov
Copy link
Contributor Author

@graemerocher With enabled logging, I see log full of messages:

Discarded inbound message EmptyLastHttpContent that reached at the tail of the pipeline. Please check your pipeline configuration.
Discarded message pipeline : [LoggingHandler#0, idle-state, http-server-codec, flow-control-handler, http-keep-alive-handler, http-compressor, http-decompressor, http-streams-codec, chunk-writer, micronaut-http-decoder, micronaut-http-encoder, websocket-upgrade-handler, micronaut-inbound-handler, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0x4229cf2d, L:/100.96.42.204:8080 - R:/100.96.42.67:43418].

Possible solutions https://stackoverflow.com/questions/15242793/netty-pipeline-warning

@dstepanov
Copy link
Contributor Author

It looks like it happens on the second request, first passes.

@dstepanov
Copy link
Contributor Author

dstepanov commented Jul 8, 2020

More info:

Jul 8, 2020 @ 09:48:14.496 nioEventLoopGroup-3-3 Read timeout at io.micronaut.http.client.netty.DefaultHttpClient.lambda$null$35(DefaultHttpClient.java:1152) (probably occured before CONNECT)

Jul 8, 2020 @ 09:48:14.494 nioEventLoopGroup-3-3 [id: 0x3bed0818] CONNECT: client request connected to host
Jul 8, 2020 @ 09:48:14.493 nioEventLoopGroup-3-3 Incoming HTTP Server Request

Multiple "Discarded inbound message" + "Discarded message pipeline" on nioEventLoopGroup-3-2 and nioEventLoopGroup-3-3

Jul 8, 2020 @ 09:47:13.489 nioEventLoopGroup-3-3 [id: 0x3bed0818] REGISTERED - incoming client request

multiple server + client requests

To me, it looks like pipeline error is blocking threads that are used for server and client processing.

Is it possible to have separate pools for the server and clients?

@graemerocher
Copy link
Contributor

From the logs this looks like it is sending a file back since chunked writer is in the pipeline. That may help narrow down things down

@dstepanov
Copy link
Contributor Author

It's not a file just a JSON request/response

@graemerocher
Copy link
Contributor

You could try set micronaut.server.thread-selection: IO which runs all server requests on the IO thread pool.

You could also replace DefaultExecutorSelector with a new bean that runs on the executor service of your choice so yes it is possible to alter the behaviour. Still that doesn't get us closer to the root of the problem, I will hopefully get time soon to look at @FORBI's example

@graemerocher
Copy link
Contributor

Unfortunately @FORBI's instructions to reproduce were not particularly helpful. His example application seems to require setup and fails with Unexpected error occurred: Failed to inject value for parameter [typeOfSrvE] of class: uy.com.antel.pi.debttlk.service.DebtFinder out of the box. Maybe if someone can provide a useful set of steps to reproduce this issue we can take a look.

@dstepanov
Copy link
Contributor Author

dstepanov commented Jul 8, 2020

I suspect the problem is related to how k8s (running inside AWS for us) is sending HTTP requests to the Micronaut server.
Simple hello-world with logging might produce the same Netty errors as I posted, maybe using a different HTTP client would also trigger the problem.

Unfortunately @FORBI's instructions to reproduce were not particularly helpful. His example application seems to require setup and fails with Unexpected error occurred: Failed to inject value for parameter [typeOfSrvE] of class: uy.com.antel.pi.debttlk.service.DebtFinder out of the box. Maybe if someone can provide a useful set of steps to reproduce this issue we can take a look.

I think you can delete typeOfSrvE

@graemerocher
Copy link
Contributor

if I delete that and then hit http://localhost:8080/debttlk-mn/debttlk/foo all I see is:

<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx</center>
</body>
</html>

io.micronaut.http.client.exceptions.HttpClientResponseException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx</center>
</body>
</html>

	at io.micronaut.http.client.netty.DefaultHttpClient$11.channelRead0(DefaultHttpClient.java:2046)
	at io.micronaut.http.client.netty.DefaultHttpClient$11.channelRead0(DefaultHttpClient.java:1964)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)

But not the problem in question.

@graemerocher
Copy link
Contributor

And yes I have tried deployment via docker and k8s locally

@dstepanov
Copy link
Contributor Author

@graemerocher No netty errors when tracing is enabled?

@graemerocher
Copy link
Contributor

What I need is a sample application with clear steps to reproduce and maybe we can make some progress

@marianozunino
Copy link

Hello world app with a @Client should be enough. And as I say, in my case it only happens in k8s

@jameskleeh
Copy link
Contributor

@FORBI Please create the hello world app and test to verify the problem occurs. If it does, document the steps you took to reproduce the issue in a README.md of the project and upload it to Github. Then create a link to that project here. We can't go back and forth wasting time with projects that aren't setup correctly to reproduce the issue.

@marianozunino
Copy link

marianozunino commented Jul 8, 2020

@jameskleeh https://github.com/forbi/demo

Edit: I added a fail.log that shows that the request returns a timeout but the request is still processed.

@graemerocher
Copy link
Contributor

Seems a similar or related error but on 1.3.x #3694

@dstepanov
Copy link
Contributor Author

@graemerocher I see the same errors in the project attached, so it looks like there is a way to reproduce it 👍

@graemerocher
Copy link
Contributor

if it is the same problem

@dstepanov
Copy link
Contributor Author

@graemerocher Read timeouts + "Discarded inbound message" I think it's the same problem

@dstepanov
Copy link
Contributor Author

@graemerocher BTW "Discarded" message occurs only in Micronaut 2, I did migrate it to it

@graemerocher
Copy link
Contributor

Right, but the read timeouts still occur regardless of that problem

@graemerocher
Copy link
Contributor

graemerocher commented Jul 9, 2020

I created a PR for the discarded message problem #3695

Regarding the problem with the read timeout in the sample application, these were caused by multiple places in the sample application where the event loop was being blocked and blocking calls were happening. I resolved them with

graemerocher/micronaut-http@5d2f072

I would suggest that in folks who are reporting this issue review their code for places where the event loop is being blocked and if it is you will need to add @ExecuteOn(TaskExecutors.IO) in Micronaut 2.0 or preferably don't block the event loop

@graemerocher graemerocher added status: pr submitted A pull request has been submitted for the issue and removed status: awaiting validation Waiting to be validated as a real issue labels Jul 9, 2020
@graemerocher graemerocher added this to the 2.0.1 milestone Jul 9, 2020
@dstepanov
Copy link
Contributor Author

@graemerocher Thanks!

I will try if it helps when it’s merged.
Not really understand how can it deadlock event loop, only case I would think is if app calls itself.

@graemerocher
Copy link
Contributor

I don't think the issues are related because the same issue occurs on 1.3.x at least with that sample app. In Micronaut 2.x the problem is probably more likely to happen if you block the event loop since we run all operations by default on the event loop.

Having said that some people are reporting the second request failing (see #2905) which we need to investigate as well, but still waiting for a sample app for that case

@graemerocher
Copy link
Contributor

Maybe a better reproducer https://github.com/volnei/bugreport

@graemerocher
Copy link
Contributor

Nope, they are also doing blocking operations in the controller in that report. Application fixed with:

---
 .../main/java/gateway/WorkerController.java   |  3 ++
 .../src/test/java/gateway/GatewayTest.java    | 30 +++++++++++++++++--
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/gateway/src/main/java/gateway/WorkerController.java b/gateway/src/main/java/gateway/WorkerController.java
index d5cc164..b578643 100644
--- a/gateway/src/main/java/gateway/WorkerController.java
+++ b/gateway/src/main/java/gateway/WorkerController.java
@@ -2,12 +2,15 @@ package gateway;
 
 import io.micronaut.http.annotation.*;
 import io.micronaut.http.HttpResponse;
+import io.micronaut.scheduling.TaskExecutors;
+import io.micronaut.scheduling.annotation.ExecuteOn;
 
 import java.util.Map;
 
 import javax.inject.Inject;
 
 @Controller("/worker")
+@ExecuteOn(TaskExecutors.IO)
 public class WorkerController {
-- 

@dstepanov
Copy link
Contributor Author

@graemerocher I think it's not a good idea to switch to the manual thread selection in version 2, developers are going to run into this kind of issues. Maybe it would be better to introduce some kind of smart auto selector which would measure the call time and decide if it should run on EL or IO executor.

@graemerocher
Copy link
Contributor

We are considering adding blocked thread detection for the event loop.

Users can still go back to 1.x behavior by setting the thread selection strategy

jameskleeh added a commit that referenced this issue Jul 10, 2020
Co-authored-by: James Kleeh <james.kleeh@gmail.com>
@dstepanov
Copy link
Contributor Author

Switching thread selection to AUTO solves the problem, some kind of detection of blocking the event loop would be nice.

@maheshloya-aegon
Copy link

Where do we set thread selection to AUTO?

@graemerocher
Copy link
Contributor

See https://docs.micronaut.io/latest/guide/configurationreference.html#io.micronaut.http.server.HttpServerConfiguration

@nirvana124
Copy link

I am intermittently getting this issue when upgraded from1.3 to 2.2.2
tried with thread selection as AUTO/IO
increased timeout of http client to 20s
each controller is annotated with @ExecuteOn(TaskExecutors.IO)
I have blocking rest api calls in my service but from controller returning Single<List>

Any suggestion or workaround is appreciated.

@graemerocher
Copy link
Contributor

@nirvana124 provide an example that reproduces the issue

@amckee23
Copy link

This has just reappeared for us going from 2.03 -> 2.2.3 . I’m working on an example app but it seems to be:

  • regardless of thread selection
  • Only inside k8s
  • When using a mn client

@amckee23
Copy link

Just as an update (incase others are as stupid as myself) This was an issue on our side. That being said it was clearly impacted by some change in 2.03 -> 2.2.3.

In our case we had a JWT validator on some of our endpoints. On our authentication microservice the JWKS url was set to itself so authenticated calls would in a servlet filter call out to the same service. This only resulted in the thread lock in a k8s environment i assume as we had strict request/limits set which resulted in a single netty event loop thread.

Hope this helps others who may be doing something similar.

@dynaxis
Copy link

dynaxis commented May 18, 2021

In my case, experiencing similar problem as @amckee23 's while upgrading from 2.1.2 to 2.5.1. I set the thread-selection to AUTO, and increased the event thread pool size to 8 since our EC2 is too small, resulting in single (or two) if we don't set it explicitly. We are making blocking http calls inside controllers, sometimes multiple times in a handler. The timeout stacktrace is almost identical to the one in the top of this issue.

It would be appreciated if anybody can explain how the timeout possibly occurs with blocking use of http clients. We plan to move to uses of async APIs, but in the mean time, we need some workarounds. And I think there must be some other people hitting the similar issue. If there is a clear description of how the timeout surfaces,, it may be helpful to them.

@graemerocher
Copy link
Contributor

graemerocher commented May 18, 2021

To reduce context switching and improve performance Micronaut's http server and client use the same event loop in Micronaut 2+

If you block this event loop then the server has problems assigning a thread to respond hence the problem. You have a variety of options to solve this problem:

  1. Don't block
  2. Set thread-selection to IO and always offload operations to an IO thread pool
  3. Configure a different event loop for the server. See https://docs.micronaut.io/latest/guide/#threadPools

@dynaxis
Copy link

dynaxis commented May 18, 2021

@graemerocher thank you very much. BTW, do you think 8 event loop threads together with thread-selection AUTO can still cause read timeout in the recent versions of Micronaut? Even just yes or no would be really helpful to me. Since the default IO thread pool is unbounded, it's quite weird to me that we hit the read timeout with that many event loop threads. Our server is still quite low traffic actually running one or two handlers at the same time.

Also I appreciate your advice above. I'll try to configure a separate thread pool for server side.

*) w/ the exact same setting of thread-selection AUTO and no explicitly set event loop threads, we get no timeout on 2.1.2 and timeouts on 2.5.1.

@amckee23
Copy link

Hi @dynaxis we saw something similar in 2.5.1 which was caused by: #5389. This caused the thread selection logic to be bypassed on controllers which specified a body.

It was resolved in 2.5.2 tho.

@dynaxis
Copy link

dynaxis commented May 18, 2021

@amckee23 Thanks a lot. That was the very cause of the regression we were experiencing. The problem also manifests when micronaut.server.thread-selection is set to IO. We will upgrade to 2.5.3.

@dejanvasic85
Copy link

dejanvasic85 commented Sep 16, 2022

We have come up against this issue recently with micronautVersion=3.5.4

The issue appears only when running in ECS Fargate (docker). We experience the issue with a graphQL endpoint which configured in application.yml:

graphql:
  enabled: true

And it only happens when the graphQL operation calls a downstream service using the Micronaut http client. After some testing, we found that it usually happens on the 5th or 6th network call out of 10 requests. The request times out (~10 seconds) and Read Timeout exception occurs.

The fix for us was setting the event loops in config:

micronaut:
  application:
    name: xxx

  netty:
    event-loops:
      custom-event-loop:
        num-threads: 20

  http:
    client:
      event-loop-group: custom-event-loop
      read-timeout: 45s 

Posting, in case anyone else comes across this issue in later versions.

@chriskosys
Copy link

@dejanvasic85 Thanks a lot, this was extremly helpful. I got the same problem but without graphql in micronautVersion=3.7.1
after about 20 requests i got a timeout. now i added

  netty:
    event-loops:
      custom-event-loop:
        num-threads: 20

and

  http:
    client:
      event-loop-group: custom-event-loop

and the issue is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: pr submitted A pull request has been submitted for the issue
Projects
None yet
Development

No branches or pull requests

10 participants