Skip to content

Intermittent broken pipe #24387

@nabbott2008

Description

@nabbott2008

Bug description
We are seeing
[Envoy (Epoch 0)] [2020-05-27 20:35:08.309][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:452] [C1340] idle timeout
in istio-proxy logs for TCP connections.
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
As this is a TCP connection, we shouldn't be seeing these 60m timeouts from what I can see in the documentation - it looks as though the connection is being treat as a HTTP connection.
We upgraded recently from Istio 1.3.6 and we weren't seeing these issues.
Steps to reproduce the bug
Here is the service we're testing against:

kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
  creationTimestamp: "2020-05-26T07:08:07Z"
  labels:
    app: abc-opensource
    chart: abc-opensource-0.29.0
    heritage: Tiller
    product: redis
    release: abc-redis
  name: abc-redis-opensource-announce-0
  namespace: abc
  resourceVersion: "29305795"
  selfLink: /api/v1/namespaces/abc/services/abc-redis-opensource-announce-0
  uid: e3629883-79fe-4304-8cb1-9b544ed152ad
spec:
  clusterIP:  1.2.3.4
  ports:
  - name: tcp-server
    port: 6379
    protocol: TCP
    targetPort: redis
  - name: tcp-sentinel
    port: 26379
    protocol: TCP
    targetPort: sentinel
  publishNotReadyAddresses: true
  selector:
    app: abc-opensource
    release: abc-redis
    statefulset.kubernetes.io/pod-name: abc-redis-opensource-server-0
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

client version: 1.5.4
egressgateway version: 1.5.4
ingressgateway version: 1.5.4
ingressgateway version: 1.5.4
ingressgateway-public version:
pilot version: 1.5.4
data plane version: 1.5.1 (32 proxies), 1.5.4 (173 proxies)
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:51Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-e16311", GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", BuildDate:"2020-03-27T22:37:12Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Client: &version.Version{SemVer:"v2.16.7", GitCommit:"5f2584fd3d35552c4af26036f0c464191287986b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}

The client uses jredis. We see the issue where the client is disconnected from sentinel:
Lost connection to Sentinel at abc-redis-opensource-announce-0:26379. Sleeping 5000ms and retrying.
The issue that is causing us most pain though, is when we ask redis client to make a new request. It seems jredis creates a connection to redis when it starts and then this connection is timed out by istio. This means when any of our apps make a new request to jredis, it doesn't have a connection to redis established and needs to create a new one. We see

redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe (Write failed)","rootCause":"Broken pipe (Write failed)"}]}}

We are seeing similar issues between other apps but finding that harder to replicate.
We weren't seeing these issues on 1.3.6 and we're still not seeing these issues on other clusters running that version
How was Istio installed?
Using operator.
Environment where bug was observed (cloud vendor, OS, etc)
EKS

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/networkingkind/need more infoNeed more info or followup from the issue reporterlifecycle/automatically-closedIndicates a PR or issue that has been closed automatically.lifecycle/staleIndicates a PR or issue hasn't been manipulated by an Istio team member for a while

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions