New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Micronaut 2: Http client read timeouts errors #3651
Comments
Ok, little bit hard to provide a solution without an example that reproduces the issue 😢 |
Having the same issue. It only happens when I the app runs in a pod. Project that is causing the problem: https://github.com/forbi/slow-mn |
@FORBI do you have steps to reproduce for that app? |
@graemerocher Is it possible this commit 1f370c4 caused something like this? |
@dstepanov Could be. If we had a way to reproduce the issue then we could try do a git bisect or revert the commit and see if it is the cause |
Deploy the app to k8s and hit the endpoint between 5-10 times. |
@graemerocher With enabled logging, I see log full of messages:
Possible solutions https://stackoverflow.com/questions/15242793/netty-pipeline-warning |
It looks like it happens on the second request, first passes. |
More info:
To me, it looks like pipeline error is blocking threads that are used for server and client processing. Is it possible to have separate pools for the server and clients? |
From the logs this looks like it is sending a file back since chunked writer is in the pipeline. That may help narrow down things down |
It's not a file just a JSON request/response |
You could try set You could also replace |
Unfortunately @FORBI's instructions to reproduce were not particularly helpful. His example application seems to require setup and fails with |
I suspect the problem is related to how k8s (running inside AWS for us) is sending HTTP requests to the Micronaut server.
I think you can delete |
if I delete that and then hit http://localhost:8080/debttlk-mn/debttlk/foo all I see is:
But not the problem in question. |
And yes I have tried deployment via docker and k8s locally |
@graemerocher No netty errors when tracing is enabled? |
What I need is a sample application with clear steps to reproduce and maybe we can make some progress |
Hello world app with a |
@FORBI Please create the hello world app and test to verify the problem occurs. If it does, document the steps you took to reproduce the issue in a README.md of the project and upload it to Github. Then create a link to that project here. We can't go back and forth wasting time with projects that aren't setup correctly to reproduce the issue. |
@jameskleeh https://github.com/forbi/demo Edit: I added a fail.log that shows that the request returns a timeout but the request is still processed. |
Seems a similar or related error but on 1.3.x #3694 |
@graemerocher I see the same errors in the project attached, so it looks like there is a way to reproduce it 👍 |
if it is the same problem |
@graemerocher Read timeouts + "Discarded inbound message" I think it's the same problem |
@graemerocher BTW "Discarded" message occurs only in Micronaut 2, I did migrate it to it |
Right, but the read timeouts still occur regardless of that problem |
I created a PR for the discarded message problem #3695 Regarding the problem with the read timeout in the sample application, these were caused by multiple places in the sample application where the event loop was being blocked and blocking calls were happening. I resolved them with graemerocher/micronaut-http@5d2f072 I would suggest that in folks who are reporting this issue review their code for places where the event loop is being blocked and if it is you will need to add |
@graemerocher Thanks! I will try if it helps when it’s merged. |
I don't think the issues are related because the same issue occurs on 1.3.x at least with that sample app. In Micronaut 2.x the problem is probably more likely to happen if you block the event loop since we run all operations by default on the event loop. Having said that some people are reporting the second request failing (see #2905) which we need to investigate as well, but still waiting for a sample app for that case |
Maybe a better reproducer https://github.com/volnei/bugreport |
Nope, they are also doing blocking operations in the controller in that report. Application fixed with: ---
.../main/java/gateway/WorkerController.java | 3 ++
.../src/test/java/gateway/GatewayTest.java | 30 +++++++++++++++++--
2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/gateway/src/main/java/gateway/WorkerController.java b/gateway/src/main/java/gateway/WorkerController.java
index d5cc164..b578643 100644
--- a/gateway/src/main/java/gateway/WorkerController.java
+++ b/gateway/src/main/java/gateway/WorkerController.java
@@ -2,12 +2,15 @@ package gateway;
import io.micronaut.http.annotation.*;
import io.micronaut.http.HttpResponse;
+import io.micronaut.scheduling.TaskExecutors;
+import io.micronaut.scheduling.annotation.ExecuteOn;
import java.util.Map;
import javax.inject.Inject;
@Controller("/worker")
+@ExecuteOn(TaskExecutors.IO)
public class WorkerController {
-- |
@graemerocher I think it's not a good idea to switch to the manual thread selection in version 2, developers are going to run into this kind of issues. Maybe it would be better to introduce some kind of smart auto selector which would measure the call time and decide if it should run on EL or IO executor. |
We are considering adding blocked thread detection for the event loop. Users can still go back to 1.x behavior by setting the thread selection strategy |
Switching thread selection to AUTO solves the problem, some kind of detection of blocking the event loop would be nice. |
Where do we set thread selection to AUTO? |
I am intermittently getting this issue when upgraded from1.3 to 2.2.2 Any suggestion or workaround is appreciated. |
@nirvana124 provide an example that reproduces the issue |
This has just reappeared for us going from 2.03 -> 2.2.3 . I’m working on an example app but it seems to be:
|
Just as an update (incase others are as stupid as myself) This was an issue on our side. That being said it was clearly impacted by some change in 2.03 -> 2.2.3. In our case we had a JWT validator on some of our endpoints. On our authentication microservice the JWKS url was set to itself so authenticated calls would in a servlet filter call out to the same service. This only resulted in the thread lock in a k8s environment i assume as we had strict request/limits set which resulted in a single netty event loop thread. Hope this helps others who may be doing something similar. |
In my case, experiencing similar problem as @amckee23 's while upgrading from 2.1.2 to 2.5.1. I set the thread-selection to AUTO, and increased the event thread pool size to 8 since our EC2 is too small, resulting in single (or two) if we don't set it explicitly. We are making blocking http calls inside controllers, sometimes multiple times in a handler. The timeout stacktrace is almost identical to the one in the top of this issue. It would be appreciated if anybody can explain how the timeout possibly occurs with blocking use of http clients. We plan to move to uses of async APIs, but in the mean time, we need some workarounds. And I think there must be some other people hitting the similar issue. If there is a clear description of how the timeout surfaces,, it may be helpful to them. |
To reduce context switching and improve performance Micronaut's http server and client use the same event loop in Micronaut 2+ If you block this event loop then the server has problems assigning a thread to respond hence the problem. You have a variety of options to solve this problem:
|
@graemerocher thank you very much. BTW, do you think 8 event loop threads together with thread-selection AUTO can still cause read timeout in the recent versions of Micronaut? Even just yes or no would be really helpful to me. Since the default IO thread pool is unbounded, it's quite weird to me that we hit the read timeout with that many event loop threads. Our server is still quite low traffic actually running one or two handlers at the same time. Also I appreciate your advice above. I'll try to configure a separate thread pool for server side. *) w/ the exact same setting of thread-selection AUTO and no explicitly set event loop threads, we get no timeout on 2.1.2 and timeouts on 2.5.1. |
@amckee23 Thanks a lot. That was the very cause of the regression we were experiencing. The problem also manifests when |
We have come up against this issue recently with The issue appears only when running in ECS Fargate (docker). We experience the issue with a graphQL endpoint which configured in application.yml:
And it only happens when the graphQL operation calls a downstream service using the Micronaut http client. After some testing, we found that it usually happens on the 5th or 6th network call out of 10 requests. The request times out (~10 seconds) and Read Timeout exception occurs. The fix for us was setting the event loops in config:
Posting, in case anyone else comes across this issue in later versions. |
@dejanvasic85 Thanks a lot, this was extremly helpful. I got the same problem but without graphql in micronautVersion=3.7.1
and
and the issue is gone. |
Not sure exactly what is going on but getting a lot of those when deployed. I saw there was some issue regarding HTTP2 but this the same code as in 1.3. I have tried different stuff but cannot reproduce locally.
The text was updated successfully, but these errors were encountered: