-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent broken pipe #24387
Comments
Could you paste the |
|
You can try setting ISTIO_META_IDLE_TIMEOUT env variable to a duration (ie, 24h0m0s) on the Istio-proxy for this workload and that should do the trick. Let me know how that works for you. |
Hi, Thanks for the response. I added the following annotation:
I tried adding this to the client and server in this scenario and it didn't stop the timeouts. We have also tried to add this envoyfilter
Can see this added to config:
However, still seeing the hourly timeouts. (if I do set above to 60 seconds, I do get 60 second timeouts so think this is a timeout on the front end rather than the one we're having issues with) |
Yah I'm not sure why the annotation doesn't work (I'll check) but just add it as an env variable like I previously mentioned... add it to the client side proxy: ISTIO_META_IDLE_TIMEOUT |
It adds correctly using the annotation but, unfortunately, don't see any change to the timeout. |
Ah okay let me check the cluster config part. There may not be a way to do that at the moment. Will report back. |
Thanks. N.B. We are using Istio 1.6.2 now. |
I gotta be honest, I'm not able to reproduce this. I'm getting the idle_timeout enforced by the tcp_proxy correctly. Now.. if you do have filters on the cluster as well, we should take a look at those... can you give me the json output of this cluster Worst case, I can jump on a zoom call with you and we can take a look together. |
I'm having a similar issue on Istio 1.5.8. TCP connections are dropped for various service types (SSL, postgres, redis) after 1 hour. Like @nabbott2008 , settings ISTIO_META_IDLE_TIMEOUT to 24h0m0s did not work but 30s did work. |
+1, We are seeing exactly the same issue as @courcelm described, on Istio 1.5.6. |
I know Using an EnvoyFilter to set the cluster level
Envoy defaults all |
We ran into the same problem on 1.7, but we noticed that the
We also created a similar filter to apply to the passthrough cluster (so that timeouts still apply to external traffic that we don't have service entries for), since the config wasn't being picked up there either. |
I'm having this 1hr timeout issue also with Postgres client. Using istio1.6.7 |
env ISTIO_META_IDLE_TIMEOUT is only for outbound, so it is good for ingress gateway, but not for Pod annotation. For service client, we shall use DR + KA configuration. |
@Rayzhangtian We're having issues with this and connections to postgres (similar to @jsabalos). Could you please share what DR + KA configs you used that solved the issue? |
I think increasing or disabling |
here is my topology
I am reducing TCP proxy idle timeout to simulate the problem. It's doesn't matter if it's 3s, 1h, 24h, or 7d if the client is idle more than the threshold we set, it's causing broken pipe:
the first request success, it's creating TCP proxy
after 3 seconds of the idle timeout, it seems like closing the connection, but only to upstream redis
and the second request, it's getting 502
and somehow, after the second request is failing, the third request success to create the TCP proxy
you can see the full log (service sidecar) here: I have concern why the envoy sidecar is only terminating to upstream (in this case Redis), but not terminating to downstream itself (the application where the sidecar is running) This causes us a problem since we need to configure all our applications to set TCP idle timeout below the Envoy configuration. For example, if we have a 1-hour idle timeout (default Envoy TCP proxy idle timeout), we must make sure our application that connects to Redis/Postgresql or whatever that is using TCP proxy to have an idle timeout below 1 hour so our application can initiate a TCP closing handshake. We receive lots of complaints from devs. FYI:
Istio version: v1.9.9 @howardjohn do you know about this? did we miss something? |
give diagram that might help to understand the issue there is two expected behavior that could solve this problem:
|
We try to implement TCP keepalive, but this doesn't block TCP Proxy
|
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-11-02. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions. Created by the issue and PR lifecycle manager. |
@ryant1986 could you share the Envoyfilter for passthrough cluster sample here. |
For anyone still facing broken pipe errors due to upstream closing the connection, based on this comment we ended up using an EnvoyFilter but in our case setting the idleTimeout to 30s did the trick and not 24h. Further the context was set to We are on v.1.19.5.
|
Bug description
We are seeing
[Envoy (Epoch 0)] [2020-05-27 20:35:08.309][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:452] [C1340] idle timeout
in istio-proxy logs for TCP connections.
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
As this is a TCP connection, we shouldn't be seeing these 60m timeouts from what I can see in the documentation - it looks as though the connection is being treat as a HTTP connection.
We upgraded recently from Istio 1.3.6 and we weren't seeing these issues.
Steps to reproduce the bug
Here is the service we're testing against:
Version (include the output of
istioctl version --remote
andkubectl version
andhelm version
if you used Helm)The client uses jredis. We see the issue where the client is disconnected from sentinel:
Lost connection to Sentinel at abc-redis-opensource-announce-0:26379. Sleeping 5000ms and retrying.
The issue that is causing us most pain though, is when we ask redis client to make a new request. It seems jredis creates a connection to redis when it starts and then this connection is timed out by istio. This means when any of our apps make a new request to jredis, it doesn't have a connection to redis established and needs to create a new one. We see
We are seeing similar issues between other apps but finding that harder to replicate.
We weren't seeing these issues on 1.3.6 and we're still not seeing these issues on other clusters running that version
How was Istio installed?
Using operator.
Environment where bug was observed (cloud vendor, OS, etc)
EKS
The text was updated successfully, but these errors were encountered: