Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omitting listener for management address #1194

Closed
prune998 opened this issue Oct 20, 2017 · 28 comments
Closed

Omitting listener for management address #1194

prune998 opened this issue Oct 20, 2017 · 28 comments

Comments

@prune998
Copy link
Contributor

I have some Kubernetes Deployment/service with Istio sidecart that are generating a lot of warnings :

istio-pilot-643415899-4d2cj discovery I1020 13:55:07.677134       1 discovery.go:332] Cleared discovery service cache
istio-pilot-643415899-4d2cj discovery W1020 13:55:07.962053       1 config.go:205] Omitting listener for management address tcp_10.20.1.40_12901 (tcp://10.20.1.40:12901) due to collision with service listener http_10.20.1.40_12901 (tcp://10.20.1.40:12901)
istio-pilot-643415899-4d2cj discovery W1020 13:55:09.006729       1 config.go:205] Omitting listener for management address tcp_10.20.1.40_12901 (tcp://10.20.1.40:12901) due to collision with service listener http_10.20.1.40_12901 (tcp://10.20.1.40:12901)

At the same time I have Envoy sidecart errors too :

useredged-istio-4276172085-z55qh istio-proxy [2017-10-20 14:08:56.313][14][warning][config] external/envoy/source/server/listener_manager_impl.cc:248] error adding listener: 'http_10.20.1.40_12901' has duplicate address '10.20.1.40:12901' as existing listener
useredged-istio-4276172085-z55qh istio-proxy [2017-10-20 14:08:56.314][14][warning][upstream] external/envoy/source/server/lds_subscription.cc:65] lds: fetch failure: error adding listener: 'http_10.20.1.40_12901' has duplicate address '10.20.1.40:12901' as existing listener

The deployment is pretty simple, with port 12900 (grpc) and 12901 (http), in a testing Namespace (staging).

apiVersion: v1
kind: Service
metadata:
  labels:
    app: useredged
    track: staging
  name: useredged
  namespace: staging
spec:
  ports:
  - name: http-ue
    port: 12901
    protocol: TCP
    targetPort: http-ue
  - name: grpc
    port: 12900
    protocol: TCP
    targetPort: grpc
  selector:
    app: useredged
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: useredged
    track: staging
  name: useredged-istio
  namespace: staging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: useredged
  template:
    metadata:
      labels:
        app: useredged
        track: staging
    spec:
      containers:
      - env:
        - name: GRPCPORT
          value: "12900"
        - name: HTTPPORT
          value: "12901"
        - name: DEBUG
          value: "false"
        image: useredge/useredged:4165890
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 12901
            scheme: HTTP
          initialDelaySeconds: 2
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 1
        name: useredged
        ports:
        - containerPort: 12900
          name: grpc
          protocol: TCP
        - containerPort: 12901
          name: http-ue
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 12901
            scheme: HTTP
          initialDelaySeconds: 2
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 1

The entrypoint in this image connect to a Kafka broker. If no Kafka broker is reachable, the application quits (and the container exit).

My feeling is that, when everything is started, the UserEdge application can't connect to Kafka because the proxy/network setup is not done yet. So the container quits and start over. Then, Istio or Envoy keeps the old listener in memory and think there is a duplicate...

As far as I can see, this is adding a LOT of warning logs, but the service is working as expected.
I've been looking around, trying to debug the state of Envoy sidecart and Istio mesh, but wasn't able to find anything usefull...

Any help is welcome...

@prune998
Copy link
Contributor Author

Istio 2.7, Kubernetes 1.7.8 on GKE, coreOS nodes

@prune998
Copy link
Contributor Author

same with istio 0.2.10

I had my application container changed so it won't restart when istio proxy is not yet ready, same behaviour...

@munnerz
Copy link
Member

munnerz commented Oct 30, 2017

I've just deployed Istio onto an alpha GKE 1.8.1 cluster with RBAC and initializers enabled. I've deployed microservices-demo. Specifically, I've only deployed the front-end service. I'm seeing this same issue after creating just a Deployment (with a single pod) and a corresponding service:

(from Pilot):

W1030 14:50:53.040394       1 config.go:202] Omitting listener for management address tcp_10.56.1.5_8079 (tcp://10.56.1.5:8079) due to collision with service listener tcp_10.56.1.5_8079 (tcp://10.56.1.5:8079)
W1030 14:50:53.174949       1 config.go:202] Omitting listener for management address tcp_10.56.1.5_8079 (tcp://10.56.1.5:8079) due to collision with service listener tcp_10.56.1.5_8079 (tcp://10.56.1.5:8079)
I1030 14:51:01.268708       1 controller.go:447] Handle endpoint front-end in namespace sock-shop
I1030 14:51:01.268755       1 discovery.go:332] Cleared discovery service cache
I1030 14:51:04.268675       1 controller.go:447] Handle endpoint front-end in namespace sock-shop
I1030 14:51:04.268699       1 discovery.go:332] Cleared discovery service cache

and from the istio sidecar:

[2017-10-30 16:01:57.440][17][warning][config] external/envoy/source/server/listener_manager_impl.cc:248] error adding listener: 'tcp_10.56.2.7_8079' has duplicate address '10.56.2.7:8079' as existing listener
[2017-10-30 16:01:57.440][17][warning][upstream] external/envoy/source/server/lds_subscription.cc:65] lds: fetch failure: error adding listener: 'tcp_10.56.2.7_8079' has duplicate address '10.56.2.7:8079' as existing listener

(the latter is printing messages every 2 seconds).

This also causes my application pod to crash periodically, and my pod to go in and out of service (according to Kubernetes) periodically.

@munnerz
Copy link
Member

munnerz commented Oct 30, 2017

Ah - so after removing my livenessProbe & readinessProbe from my application deployment, it seems I'm no longer getting this problem.

This is roughly in-line with this paragraph in the quick start doc:

a) Install Istio without enabling mutual TLS authentication between sidecars.
Choose this option for clusters with existing applications, applications where
services with an Istio sidecar need to be able to communicate with other non-Istio
Kubernetes services, and applications that use liveliness and readiness probes,
headless services, or StatefulSets.

(to clarify, my deployment does have mTLS enabled)

mandarjog pushed a commit to mandarjog/istio that referenced this issue Oct 30, 2017
Former-commit-id: 7658c8e02792ce06d6a4f1b2de6ee2bc11bd5a81
mandarjog pushed a commit that referenced this issue Oct 31, 2017
Former-commit-id: 6ff864af520cf1c572f876f9d80eea5d76255bfb
@prune998
Copy link
Contributor Author

prune998 commented Nov 2, 2017

@munnerz I'm not using Auth (mTLS) at all.
This may explain your issues, as well as a badly defined probe.

My issue is different in that whatever I deploy, I have the collision log every seconds, and it only ends after istio_proxy reload (shutting down parent after drain)

@tagarwal
Copy link

seeing this with 0.2.12 as well
I don't have mTLS enabled
The services were kube-injected into the mesh
seeing these errors in kube-pilot

W1115 19:41:57.903111       7 config.go:202] Omitting listener for management address tcp_10.99.63.20_7500 (tcp://10.99.63.20:7500) due to collision with service listener tcp_10.99.63.20_7500 (tcp://10.99.63.20:7500)
W1115 19:41:58.032321       7 config.go:202] Omitting listener for management address tcp_10.99.63.13_3000 (tcp://10.99.63.13:3000) due to collision with service listener tcp_10.99.63.13_3000 (tcp://10.99.63.13:3000)
W1115 19:41:58.070237       7 config.go:202] Omitting listener for management address tcp_10.99.47.11_8080 (tcp://10.99.47.11:8080) due to collision with service listener tcp_10.99.47.11_8080 (tcp://10.99.47.11:8080)
W1115 19:41:58.655615       7 config.go:202] Omitting listener for management address tcp_10.99.63.20_7500 (tcp://10.99.63.20:7500) due to collision with service listener tcp_10.99.63.20_7500 (tcp://10.99.63.20:7500)
W1115 19:41:58.708443       7 config.go:202] Omitting listener for management address tcp_10.99.47.11_8080 (tcp://10.99.47.11:8080) due to collision with service listener tcp_10.99.47.11_8080 (tcp://10.99.47.11:8080)
W1115 19:41:58.732167       7 config.go:202] Omitting listener for management address tcp_10.99.63.12_8080 (tcp://10.99.63.12:8080) due to collision with service listener tcp_10.99.63.12_8080 (tcp://10.99.63.12:8080)
W1115 19:41:58.814653       7 config.go:202] Omitting listener for management address tcp_10.99.63.13_3000 (tcp://10.99.63.13:3000) due to collision with service listener tcp_10.99.63.13_3000 (tcp://10.99.63.13:3000)
W1115 19:41:58.856772       7 config.go:202] Omitting listener for management address tcp_10.99.63.12_8080 (tcp://10.99.63.12:8080) due to collision with service listener tcp_10.99.63.12_8080 (tcp://10.99.63.12:8080)

Due to which calls to the service are not being routed through proxy,

curl -vvvv -L localhost:9090/api/v1/namespaces/default/services/aura-admin-service:admin-service/proxy/console
*   Trying ::1...
* connect to ::1 port 9090 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /api/v1/namespaces/default/services/aura-admin-service:admin-service/proxy/console HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Content-Length: 173
< Content-Security-Policy: default-src 'self'
< Content-Type: text/html; charset=UTF-8
< Date: Wed, 15 Nov 2017 20:09:58 GMT
< Location: /api/v1/namespaces/default/services/aura-admin-service:admin-service/proxy/console/
< Server: nginx/1.13.6
< X-Content-Type-Options: nosniff
< X-Powered-By: Express

As we can see the Server is nginx, rather than envoy

@tagarwal
Copy link

One more point if it helps
when we do kube-inject prior to deployment in kubernetes this works for me
so kubectl apply -f <(istioctl kube-inject -f deployment.yaml), this works

But if this is done

  1. kubectl apply -f deployment.yaml
  2. kubectl get deployment xyz -o yaml | istioctl kube-inject -f - | kubectl apply -f -

Then I start seeing the collision in istio-proxy container log

@kyessenov
Copy link
Contributor

Unfortunately, there is an issue with liveness/health probes on the pod ports that are used for the service. The warning you see is about the case when the pod port is exposed as a service, and also used as a liveness/health probe. The simplest workaround (that also works for mTLS mode) is to use exec probes with curl, until we fix this bug properly in the upcoming releases.

@tagarwal
Copy link

@kyessenov Thanks for the reply
Would reseting the pod help in anyway for now

@prune998
Copy link
Contributor Author

@kyessenov I had a strange (new) behaviour linked to this bug...
Of course I still have the logs like :

[2017-11-22 16:48:38.649][14][warning][config] external/envoy/source/server/listener_manager_impl.cc:248] error adding listener: 'http_10.20.16.52_24500' has duplicate address '10.20.16.52:24500' as existing listener
[2017-11-22 16:48:38.653][14][warning][upstream] external/envoy/source/server/lds_subscription.cc:65] lds: fetch failure: error adding listener: 'http_10.20.16.52_24500' has duplicate address '10.20.16.52:24500' as existing listener

But also, it's like there is a network issue between istio-proxy and my processes.

I thirst thought it was due to a liveness/readyness issue with my pod, but from K8s side, it's ok :

Conditions:
  Type		Status
  Initialized 	True
  Ready 	True
  PodScheduled 	True

So everything seems up and I see a live connexion between proxy and process, but still, nothing goes through.

Everything is back to normal as soon as I remove the Liveness and Readyness probes.

Do you know if another issue is opened regarding this ?
Could you give a little bit more info about this issue ?
Looking at the Envoy config as generated by the SDS/RDS... I see nothing wrong that could cause this bug. Do you have any pointer ?
Maybe I could try to quick patch this with your help ?

Thanks

@bobbytables
Copy link
Contributor

I'm not sure if this is related but we're seeing a lot of these same log lines (lds: fetch failure: error adding listener). One pod can't connect to the gRPC service (both in the service mesh). It's receiving a 503. If I tail the istio-proxy (on the client pod) logs I can see it:

1__bobby-tables_ namely-shared 6___namely_namely__tmux

However, I don't see it on the server end.

@tjquinno
Copy link

tjquinno commented Jan 4, 2018

@kyessenov (or anyone else) Has there been any progress on fixing this (other than the previously-mentioned workarounds)?

Thanks.

@phanama
Copy link

phanama commented Jan 19, 2018

*bumps up

@emedina
Copy link

emedina commented Jan 21, 2018

Still happening in 0.4.0

@costinm
Copy link
Contributor

costinm commented Feb 21, 2018

As an update ( and possibly close this issue ):

If you set a TCP/HTTP readiness/liveness probe on the 'main' port,
that is used by other services to call:

  1. If mTLS is off, liveness and readiness probes should work just fine.

  2. If mTLS is on - we don't currently have any good way to support HTTP
    or TCP probes on the mTLS port. There are few bad ways - setting mtls
    implies connections are secured, and anything we do to downgrade this
    is likely to have security implications.

  3. In either case, the liveness/readiness probes will go trough sidecar and
    mixer and will be subject to policy / telemetry.

We are working on a patch to auto-detect TLS, for some upgrade use
cases - but it would be dangerous to auto-enable this for the primary
port or use it long term, since it would defeat the purpose of having mTLS
enabled.

IMO the best option is to use a separate port for liveness/readiness probes.
Istio will set up a plain TCP proxy, without mtls or mixer - the probe should
work in all cases, with or without istio. Unfortunately this requires small code
changes in the app.

If it is not possible to change the app for separate port, the 'exec' probe is the
next option. It won't work well if your app is based on 'scratch' - but you can
exec in the sidecar container, which has curl.

Finally, we can (and plan to) add an extra /healtz to the sidecar, and associated
liveness/readiness probe. Once this is in, it could also query the main
app (over http or tcp) - however the injector will need more complicated changes to
rewrite the app liveness probe to use the sidecars port and URL. I think it's a pretty
tricky change and I would plan it post 1.0, but if anyone really needs it we can
re-evaluate.

Regarding warnings/messages: we should reword them a bit and avoid repeating,
but they're mostly harmless.

@prune998
Copy link
Contributor Author

@costinm, it's still not working, even without TLS (point No 1).

@reynaldiwijaya
Copy link

reynaldiwijaya commented Feb 26, 2018

Hi we run into some kind of the same problem where we got this message :

[2018-02-26 09:43:44.416][12][warning][upstream] external/envoy/source/server/lds_subscription.cc:68] lds: fetch failure: error adding listener: 'http_100.96.15.81_8080' has duplicate address '100.96.15.81:8080' as existing listener
[2018-02-26 09:43:45.866][12][warning][config] external/envoy/source/server/listener_manager_impl.cc:245] error adding listener: 'http_100.96.15.81_8080' has duplicate address '100.96.15.81:8080' as existing listener

This happens everytime we do a deployment to our application, which consists of two containers, our application and istio-proxy. We do have HTTP liveness and readiness probe for this application. After some time, the log will print message like segmentation fault, and restart the proxy process, which somehow make it works after that.

Segmentation fault message

[2018-02-26 09:43:55.424][12][warning][config] external/envoy/source/server/listener_manager_impl.cc:245] error adding listener: 'http_100.96.15.81_8080' has duplicate address '100.96.15.81:8080' as existing listener
[2018-02-26 09:43:55.426][12][warning][upstream] external/envoy/source/server/lds_subscription.cc:68] lds: fetch failure: error adding listener: 'http_100.96.15.81_8080' has duplicate address '100.96.15.81:8080' as existing listener
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Segmentation fault, suspect faulting address 0xffffffff00000018
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj</usr/local/bin/envoy> thr<15> (use tools/stack_decode.py):
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #0 0x4ad3e2
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #1 0x4aca3a
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #2 0x4acb1e
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #3 0x4770da
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #4 0x463872
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #5 0x463c22
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #6 0x457bbc
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #7 0x447485
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #8 0x44cd2d
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #9 0x6ba956
[2018-02-26 09:43:55.428][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #10 0x6b8223
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #11 0x6b8a09
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #12 0x44a3c1
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #13 0x6f24db
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #14 0x6f155e
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #15 0x6f2829
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #16 0x44a3c1
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #17 0x6fa0b2
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #18 0x6fa0e1
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #19 0xc6e6c7
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #20 0xc6ec2e
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #21 0x6f2439
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #22 0x6e4f27
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #23 0x6e4abb
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #24 0x6e5606
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #25 0x44a3c1
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #26 0xc78b9f
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #27 0xc78bc4
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<15> obj</lib/x86_64-linux-gnu/libpthread.so.0>
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #28 0x7fc30a96e6b9
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<15> obj</lib/x86_64-linux-gnu/libc.so.6>
[2018-02-26 09:43:55.429][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<15> #29 0x7fc30a39b3dc
[2018-02-26 09:43:55.430][15][critical][backtrace] bazel-out/local-dbg/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:97] end backtrace thread 15
ERROR: logging before flag.Parse: W0226 09:43:55.433003       1 agent.go:204] Epoch 0 terminated with an error: signal: segmentation fault

When we turned down the probes, it starts to work properly right after the deployment finished.

Env :
Kubernetes 1.9.2
Istio 0.4.0

May I confirm this is the same problem in this issue ?

@prune998
Copy link
Contributor Author

I never had a seg fault, but the error adding listener seems to confirm the same issue.
@reynaldiwijaya, could you please give more informations on your setup, what you deployment look like and what is the Envoy config before crash ?

@reynaldiwijaya
Copy link

Hi @prune998

There was no config difference between before and after this crash. The envoy just start to work after this crash. Here are the config

{
  "listeners": [],
  "lds": {
    "cluster": "lds",
    "refresh_delay_ms": 1000
  },
  "admin": {
    "access_log_path": "/dev/stdout",
    "address": "tcp://127.0.0.1:15000"
  },
  "cluster_manager": {
    "clusters": [
      {
        "name": "rds",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://istio-pilot.istio-system:15003"
          }
        ]
      },
      {
        "name": "lds",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://istio-pilot.istio-system:15003"
          }
        ]
      },
      {
        "name": "zipkin",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://jaeger-collector.services:9411"
          }
        ]
      }
    ],
    "sds": {
      "cluster": {
        "name": "sds",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://istio-pilot.istio-system:15003"
          }
        ]
      },
      "refresh_delay_ms": 1000
    },
    "cds": {
      "cluster": {
        "name": "cds",
        "connect_timeout_ms": 10000,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://istio-pilot.istio-system:15003"
          }
        ]
      },
      "refresh_delay_ms": 1000
    }
  },
  "statsd_udp_ip_address": "100.65.84.137:9125",
  "tracing": {
    "http": {
      "driver": {
        "type": "zipkin",
        "config": {
          "collector_cluster": "zipkin",
          "collector_endpoint": "/api/v1/spans"
        }
      }
    }
  }
}

As of our setup, we create our deployment (replica set in this case) with Spinnaker. Istio is installed without tls (just normal istio not istio-auth). The istio sidecar is injected using istio initializer.

Here is the replica set yaml

apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
  annotations:
    sidecar.istio.io/status: injected-version-0.4.0
  creationTimestamp: 2018-02-26T10:24:54Z
  generation: 3
  labels:
    app: tools
    cluster: tools-dev-b2b
    detail: b2b
    load-balancer-tools: "true"
    tools-dev-b2b-v035: "true"
    replication-controller: tools-dev-b2b-v035
    stack: dev
    version: "35"
  name: tools-dev-b2b-v035
  namespace: dev
  resourceVersion: "27669473"
  selfLink: /apis/extensions/v1beta1/namespaces/dev/replicasets/tools-dev-b2b-v035
  uid: 492355d1-1adf-11e8-adc9-0695b0d11f20
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tools
      cluster: tools-dev-b2b
      detail: b2b
      tools-dev-b2b-v035: "true"
      replication-controller: tools-dev-b2b-v035
      stack: dev
      version: "35"
  template:
    metadata:
      annotations:
        sidecar.istio.io/status: injected-version-0.4.0
      creationTimestamp: null
      labels:
        app: tools
        cluster: tools-dev-b2b
        detail: b2b
        load-balancer-tools: "false"
        tools-dev-b2b-v035: "true"
        replication-controller: tools-dev-b2b-v035
        stack: dev
        version: "35"
    spec:
      containers:
      - image: docker.xxxx.xx/l/tools:dev
        imagePullPolicy: Always
        name: tools
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - proxy
        - sidecar
        - -v
        - "2"
        - --configPath
        - /etc/istio/proxy
        - --binaryPath
        - /usr/local/bin/envoy
        - --serviceCluster
        - tools
        - --drainDuration
        - 45s
        - --parentShutdownDuration
        - 1m0s
        - --discoveryAddress
        - istio-pilot.istio-system:15003
        - --discoveryRefreshDelay
        - 1s
        - --zipkinAddress
        - jaeger-collector.services:9411
        - --connectTimeout
        - 10s
        - --statsdUdpAddress
        - istio-mixer.istio-system:9125
        - --proxyAdminPort
        - "15000"
        - --controlPlaneAuthPolicy
        - NONE
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: docker.io/istio/proxy_debug:0.4.0
        imagePullPolicy: IfNotPresent
        name: istio-proxy
        resources: {}
        securityContext:
          privileged: true
          readOnlyRootFilesystem: false
          runAsUser: 1337
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/istio/proxy
          name: istio-envoy
        - mountPath: /etc/certs/
          name: istio-certs
          readOnly: true
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: docker-io
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir:
          medium: Memory
        name: istio-envoy
      - name: istio-certs
        secret:
          defaultMode: 420
          secretName: istio.default
status:
  availableReplicas: 1
  observedGeneration: 3
  readyReplicas: 1
  replicas: 1

@costinm
Copy link
Contributor

costinm commented Feb 27, 2018

Re. segfault - it seems you're using 0.4.0 - I would recommend upgrading to more recent version. It doesn't look to be related to the liveness probe.

@costinm
Copy link
Contributor

costinm commented Feb 27, 2018

@prune998 can you provide more details - like snippet of the yaml, with the readiness probe, service definition, etc. I can try to reproduce it.

@reynaldiwijaya
Copy link

@costinm @prune998
I have just updated our Istio to be 0.5.1, yeah segmentation fault thing is now gone but not about duplicate listener if we turned on the liveness probe again

i have just noticed that this issue might be related to this one #2628, and its more straight forward saying the problem with probe

Describing the pods give me this error Readiness probe failed: dial tcp 100.96.19.188:8080: getsockopt: connection refused.

This is how the probes defined:

Liveness:     http :8080 delay=0s timeout=5s period=10s #success=1 #failure=3
Readiness:    http :8080 delay=0s timeout=5s period=10s #success=1 #failure=3

Might be a wild guess but could it be possible that Envoy actually tries to create another 8080 listener even if you have defined one in pod spec for the application container? I have not read envoy code so I am not so sure about this

@reynaldiwijaya
Copy link

any update on this issue? @costinm @prune998

@wattli
Copy link
Contributor

wattli commented Mar 21, 2018

"We explicitly create listeners for all health check ports. We also create listeners for all serving ports from the service spec. When the health/liveness port is the same as the normal serving port for the server, we emit an error (since we can't have two listeners on the same port). So long as the health check is HTTP, and the server is serving HTTP on the duplicated port, there's no problem and health checking passes (see the original issue, I posted a github repo showing this works; Tao has also verified this AFAIK)." Quote from @ZackButcher .

@costinm , please help to reassign to proper person to fix this.

@wbauern
Copy link

wbauern commented Apr 5, 2018

I understand that a fix is in the works for this and that the "duplicate address" message is a red herring when mTLS is not being used. What would be the best way to filter out these messages from the istio-proxy logs? It is adding alot of noise and unneeded data being sent to CloudWatch for us. We are using Istio 0.7.1.

Thanks!

@ZackButcher
Copy link
Contributor

Unfortunately today I don't believe there's a way to disable just those log lines. However, in the upcoming 0.8 release you'll have the ability to use the Envoy v2 APIs. The code in Pilot pushing that data should avoid the duplicate port issue => no more logs in that style. Sorry there's not a more immediate solution.

@sakshigoel12
Copy link
Contributor

@ZackButcher @wattli given 0.8 release is out can this issue be closed?

@rshriram
Copy link
Member

closing the issue since the fixes went out in 0.8. Please reopen if the issue persists.

0x01001011 pushed a commit to thedemodrive/istio that referenced this issue Jul 16, 2020
* Few small updates to the mesh config API

* Make gen, update

* Update based on feedback

* Add hide from docs until impl is ready
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests