Bug description
We are seeing
[Envoy (Epoch 0)] [2020-05-27 20:35:08.309][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:452] [C1340] idle timeout
in istio-proxy logs for TCP connections.
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
As this is a TCP connection, we shouldn't be seeing these 60m timeouts from what I can see in the documentation - it looks as though the connection is being treat as a HTTP connection.
We upgraded recently from Istio 1.3.6 and we weren't seeing these issues.
Steps to reproduce the bug
Here is the service we're testing against:
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
creationTimestamp: "2020-05-26T07:08:07Z"
labels:
app: abc-opensource
chart: abc-opensource-0.29.0
heritage: Tiller
product: redis
release: abc-redis
name: abc-redis-opensource-announce-0
namespace: abc
resourceVersion: "29305795"
selfLink: /api/v1/namespaces/abc/services/abc-redis-opensource-announce-0
uid: e3629883-79fe-4304-8cb1-9b544ed152ad
spec:
clusterIP: 1.2.3.4
ports:
- name: tcp-server
port: 6379
protocol: TCP
targetPort: redis
- name: tcp-sentinel
port: 26379
protocol: TCP
targetPort: sentinel
publishNotReadyAddresses: true
selector:
app: abc-opensource
release: abc-redis
statefulset.kubernetes.io/pod-name: abc-redis-opensource-server-0
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)
client version: 1.5.4
egressgateway version: 1.5.4
ingressgateway version: 1.5.4
ingressgateway version: 1.5.4
ingressgateway-public version:
pilot version: 1.5.4
data plane version: 1.5.1 (32 proxies), 1.5.4 (173 proxies)
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:51Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-e16311", GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", BuildDate:"2020-03-27T22:37:12Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Client: &version.Version{SemVer:"v2.16.7", GitCommit:"5f2584fd3d35552c4af26036f0c464191287986b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}
The client uses jredis. We see the issue where the client is disconnected from sentinel:
Lost connection to Sentinel at abc-redis-opensource-announce-0:26379. Sleeping 5000ms and retrying.
The issue that is causing us most pain though, is when we ask redis client to make a new request. It seems jredis creates a connection to redis when it starts and then this connection is timed out by istio. This means when any of our apps make a new request to jredis, it doesn't have a connection to redis established and needs to create a new one. We see
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe (Write failed)","rootCause":"Broken pipe (Write failed)"}]}}
We are seeing similar issues between other apps but finding that harder to replicate.
We weren't seeing these issues on 1.3.6 and we're still not seeing these issues on other clusters running that version
How was Istio installed?
Using operator.
Environment where bug was observed (cloud vendor, OS, etc)
EKS
Bug description
We are seeing
[Envoy (Epoch 0)] [2020-05-27 20:35:08.309][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:452] [C1340] idle timeoutin istio-proxy logs for TCP connections.
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
As this is a TCP connection, we shouldn't be seeing these 60m timeouts from what I can see in the documentation - it looks as though the connection is being treat as a HTTP connection.
We upgraded recently from Istio 1.3.6 and we weren't seeing these issues.
Steps to reproduce the bug
Here is the service we're testing against:
Version (include the output of
istioctl version --remoteandkubectl versionandhelm versionif you used Helm)The client uses jredis. We see the issue where the client is disconnected from sentinel:
Lost connection to Sentinel at abc-redis-opensource-announce-0:26379. Sleeping 5000ms and retrying.The issue that is causing us most pain though, is when we ask redis client to make a new request. It seems jredis creates a connection to redis when it starts and then this connection is timed out by istio. This means when any of our apps make a new request to jredis, it doesn't have a connection to redis established and needs to create a new one. We see
We are seeing similar issues between other apps but finding that harder to replicate.
We weren't seeing these issues on 1.3.6 and we're still not seeing these issues on other clusters running that version
How was Istio installed?
Using operator.
Environment where bug was observed (cloud vendor, OS, etc)
EKS