Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsteam error issue [503] since side-car-istio-proxy cannot connect to pilot #6085

Closed
johnzheng1975 opened this issue Jun 7, 2018 · 49 comments

Comments

@johnzheng1975
Copy link
Member

johnzheng1975 commented Jun 7, 2018

istio 0.8.0 (mTLS disabled, also no control plane security)
k8s 1.9.5
cilium 1.1.0

Steps
Redeploy a service with new version

Expect result:
service can be accessed through istio-ingress

Actual result,
It show 503 upstream error issue.

From attached side-car istio-proxy log, it show cannot connect to istio-pilot.

[2018-06-07 23:03:32.170][16][info][main] external/envoy/source/server/server.cc:396] starting main dispatch loop
[2018-06-07 23:03:37.170][16][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:235] gRPC config stream closed: 14, no healthy upstream
[2018-06-07 23:03:37.170][16][warning][upstream] 

UPSTREQM-istio-proxyerror.log

@ryanyard
Copy link

Same issue:
istio 0.8.0
gke: 1.10

Deploy using quick start:
kubectl apply -f install/kubernetes/istio-demo-auth.yaml

Deploy sample app:
https://istio.io/docs/guides/bookinfo/

Test:
curl -o /dev/null -s -w "%{http_code}\n" http://${GATEWAY_URL}/productpage

Returns:
upstream connect error or disconnect/reset before headers

@LinAnt
Copy link

LinAnt commented Jun 12, 2018

I have the same issue, the dotwiz looks quite interesting:
https://www.dropbox.com/s/oba25uv4fnwgls7/Screenshot%202018-06-12%2013.58.48.png?dl=0

I am trying to upgrade pgadmin on a fresh cluster.

@andraxylia
Copy link
Contributor

@johnzheng1975 is the log you attached from the istio-proxy by the pilot? Did you use ingress or gateway? Could you please send logs from the gateway/ingress pods, as well as from pilot, also kubectl get all?

@andraxylia andraxylia added this to the 1.0 milestone Jun 13, 2018
@rlenglet
Copy link
Contributor

is the log you attached from the istio-proxy by the pilot?

istio-proxy

Did you use ingress or gateway?

ingress

Could you please send logs from the gateway/ingress pods, as well as from pilot, also kubectl get all?

@johnzheng1975
Copy link
Member Author

johnzheng1975 commented Jun 15, 2018

@andraxylia For logs you requested, I will send all when I find this issue next time.
@rlenglet Thanks answer for me.

@cizixs
Copy link
Contributor

cizixs commented Aug 28, 2018

Any update on this issue? same issue here.

@dwdraju
Copy link

dwdraju commented Aug 30, 2018

Similar issue here:

[2018-08-30 09:19:04.878][15][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '0.0.0.0_8060'
[2018-08-30 09:19:04.879][15][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener 'virtual'
[2018-08-30 09:19:04.900][15][info][config] external/envoy/source/server/listener_manager_impl.cc:908] all dependencies initialized. starting workers
[2018-08-30 09:20:04.894][15][info][main] external/envoy/source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-08-30 09:25:36.297][15][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 13, 
[2018-08-30 09:30:36.594][15][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 13, 
[2018-08-30 09:35:37.039][15][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 13,

@jojimt
Copy link

jojimt commented Oct 16, 2018

I am hitting this issue as well. Is there a workaround? Or an older version that does not exhibit this issue?

@zparnold
Copy link

zparnold commented Nov 1, 2018

Oh no, sorry to say that I am joining the party here. Seems to only occur when a deployment is in flight. Do I have a messed up policy config? I have the same setup as @johnzheng1975

@costinm
Copy link
Contributor

costinm commented Nov 2, 2018

The logs seem to show that pilot closes the connection every 5 minutes - but it reconnects after immediately. It's actually a feature ( an accidental one ) - so pilot connections gets re-balanced. We're working on a better way to rebalance - and after we'll fix this 5-min reconnect.

AFAIK it should not cause any problems.

@jojimt
Copy link

jojimt commented Nov 2, 2018

@costinm re: "AFAIK it should not cause any problems." I am getting 503 while accessing the bookinfo sample application. If this message is not indicative of the problem, is there something else I can check?

Relevant logs are below:

Gateway:
[2018-10-15T20:55:29.665Z] "GET /productpage HTTP/1.1" 503 UF 0 57 1003 - "1.100.101.13" "curl/7.54.0" "05e952c1-9a46-9f3a-9a88-64b777f7e0ec" "172.28.184.179:33637" "10.2.97.32:9080"

productpage sidecar:
[2018-10-15 21:07:29.445][23][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-10-15 21:07:29.445][23][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream

@JAtula
Copy link

JAtula commented Nov 6, 2018

Just hit the same issue, a workaround would be awsome, if somebody knows one?

@jsw
Copy link

jsw commented Nov 6, 2018

I just ran into this on Istio 1.0.3. Not sure if I saw this on previous versions. Deleting the istio-pilot pods seems to help, but is probably only a temporary fix. The pods were only a day old (upgraded Istio 1.0.2 -> 1.0.3 yesterday) and I didn't notice anything obviously bad in the pilot dashboard. Perhaps the recent activity in this issue is people running 1.0.3?

@courcelm
Copy link

courcelm commented Nov 7, 2018

I'm also hitting UNAVAILABLE:upstream connect error or disconnect/reset before headers on Istio 1.0.3

@JAtula
Copy link

JAtula commented Nov 7, 2018

I just ran into this on Istio 1.0.3. Not sure if I saw this on previous versions. Deleting the istio-pilot pods seems to help, but is probably only a temporary fix. The pods were only a day old (upgraded Istio 1.0.2 -> 1.0.3 yesterday) and I didn't notice anything obviously bad in the pilot dashboard. Perhaps the recent activity in this issue is people running 1.0.3?

Yeah, deleting pilot seems to fix issue, but it's not really ideal 😄

Ps. running 1.0.3 as well.

@cchanley2003
Copy link

cchanley2003 commented Nov 7, 2018

I am seeing this with 1.0.3 as well. My scenario:

A nginx proxy sits in front of apache httpd. I have two apaches deployed with different versions of a web app. All pods are within the service mesh and have automatically injected sidecars. I have a virtual service sitting in front of the apache. Any time I restart the apache pods I see the disconnect/reset errors. Restarting pilot or applying new virutalservices routes to the apache virtual service fixes the issues.

Any logs I should look for?

@sysC0D
Copy link

sysC0D commented Nov 8, 2018

Same problem on Istio 1.0.3.
In most case, when I reapply our configuration files this error is fixed but not for a long time.
Maybe be the volumetry? 4000-5000/req/s on ingress-gateway.

@wansuiye
Copy link

wansuiye commented Nov 9, 2018

same problem on Istio 1.0.3, qps is 40~50

@frankbu frankbu removed their assignment Nov 9, 2018
@emedina
Copy link

emedina commented Dec 10, 2018

Possibly related to #10360

@novakov-alexey-zz
Copy link

Similar behaviour with 1.0.5. Restarting the pilot POD helps.

@gowthamreddyvintha
Copy link

I am facing a similar issue with 1.0.5. Even restarting the pilot pod didn't solve my problem.

@kish3007
Copy link

I am facing the same issue, restarting pilot doesn't seem to help.

@m1o1
Copy link

m1o1 commented Jan 22, 2019

Is the problem reproducible with Minikube? Is it hosted? If it only happens on a hosted platform it might be related to an issue I was having.

@kish3007
Copy link

My issue is in GKE, running Istio 1.0.5(mtls not enabled), noticed this when I was trying to access one of my service(https) through Gateway.

@gowthamreddyvintha
Copy link

My issue on AKS, running Istio 1.0.5(mtls not enabled). Sample Book info app is running fine, but when deployed own application( simple web app) VS not routing to the route path of the pod.

@liutanrong
Copy link

liutanrong commented Jan 26, 2019

I probably found the cause of the problem.

The reason

I found that the cluster in istio-proxy contains the Kubernetes pod IP that no longer exists.
And I see from the error log of istio-proxy that all request traffic with an error of 503 UF is sent to this non-existing IP. So I think I found the reason for this problem.

How to solve this

In my case,i solve it by apply the destinationRule again,and istio sync the right cluster.

More question

Why isio still retains the pod ip that has been stopped?
this issue my relate with #9480

The log

In my case,the cloudspidergateway have 2 pod in Kubernetes which ip is 10.244.25.4,10.244.14.34 ,but the istio-proxy think i have three pod (10.244.25.4,10.244.7.51,10.244.14.34)

the error log of istio-proxy

docker logs <istio-proxy container> |grep 503 |grep UF

{"log":"[2019-01-23T19:28:44.859Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/ff808081662e858401687bfd7d51256fHTTP/1.1\" 503 UF 394 57 999 - \"-\" \"Java/1.8.0_181\" \"8c75b45d-5b51-98a8-b4f8-d378c675aae4\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41746\n","stream":"stdout","time":"2019-01-23T19:28:53.30835571Z"}
{"log":"[2019-01-23T19:28:55.415Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 464 57 506 - \"-\" \"Java/1.8.0_181\" \"0ae79043-9708-9545-9d8b-76c2482cef33\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41752\n","stream":"stdout","time":"2019-01-23T19:29:03.311469856Z"}
{"log":"[2019-01-23T19:29:09.249Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 466 57 495 - \"-\" \"Java/1.8.0_181\" \"a8592ead-409f-9a2a-a8c7-0ec6a25deb74\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41762\n","stream":"stdout","time":"2019-01-23T19:29:13.31274334Z"}
{"log":"[2019-01-23T19:29:27.814Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 469 57 570 - \"-\" \"Java/1.8.0_181\" \"1f52f1fd-81d8-9329-8fba-9783244bdff6\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41768\n","stream":"stdout","time":"2019-01-23T19:29:33.312063164Z"}
{"log":"[2019-01-23T19:29:38.026Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/ff808081662e858401687bfd7d51256fHTTP/1.1\" 503 UF 394 57 1001 - \"-\" \"Java/1.8.0_181\" \"cc896a0a-4a1a-98ba-8daa-3e5354b31545\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41772\n","stream":"stdout","time":"2019-01-23T19:29:43.310374027Z"}
{"log":"[2019-01-23T19:29:46.382Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 461 57 719 - \"-\" \"Java/1.8.0_181\" \"f5fbcb7b-aa36-9f18-aad5-04e9c3fc9d62\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41778\n","stream":"stdout","time":"2019-01-23T19:29:53.310893339Z"}
{"log":"[2019-01-23T19:29:58.154Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 446 57 723 - \"-\" \"Java/1.8.0_181\" \"87c376c9-ad42-9c70-8e44-cdd4d31c76a5\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41784\n","stream":"stdout","time":"2019-01-23T19:30:03.309634823Z"}
{"log":"[2019-01-23T19:30:10.029Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017726a6dHTTP/1.1\" 503 UF 438 57 692 - \"-\" \"Java/1.8.0_181\" \"139f3e91-adfc-9f7a-a0ab-b64fe20b9127\" \"cloudSpiderGateWay:9000\" \"10.244.7.51:9000\" outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local - 10.96.123.71:9000 10.244.13.6:41792\n","stream":"stdout","time":"2019-01-23T19:30:13.312569876Z"}
{"log":"[2019-01-23T19:30:21.735Z] \"POST /cloudSpiderAccessor/v1/executeReport/taskId/e4e4781d662e89310168770017^C

The Kubernetes service describe

kubectl describe service cloudspidergateway -n cloudspider
Name:              cloudspidergateway
Namespace:         cloudspider
Labels:            app=cloudspidergateway
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"creationTimestamp":null,"labels":{"app":"cloudspidergateway"},"name":"cloudspidergate...
Selector:          app=cloudspidergateway
Type:              ClusterIP
IP:                10.96.123.71
Port:              http-9000-9000-ztdzg  9000/TCP
TargetPort:        9000/TCP
Endpoints:         10.244.14.34:9000,10.244.25.4:9000
Session Affinity:  None
Events:            <none>

the cluster info find in istio-proxy

curl 127.0.0.1:15000/clusters |grep cloudspidergateway

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_connections::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_pending_requests::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_requests::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_retries::3
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_connections::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_requests::1024
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_retries::3
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::added_via_api::true
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_active::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_connect_fail::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_total::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_active::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_error::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_success::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_timeout::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_total::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::health_flags::healthy
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::weight::1
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::region::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::zone::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::sub_zone::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::canary::false
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::success_rate::-1
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_active::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_connect_fail::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_total::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_active::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_error::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_success::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_timeout::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_total::0
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::health_flags::healthy
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::weight::1
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::region::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::zone::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::sub_zone::
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::canary::false
outbound|9000|v2|cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::success_rate::-1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_connections::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_pending_requests::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_requests::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_retries::3
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_connections::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_requests::1024
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_retries::3
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::added_via_api::true
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_active::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_connect_fail::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_total::12330
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_active::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_error::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_success::78050
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_timeout::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_total::78051
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::health_flags::healthy
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::weight::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::region::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::sub_zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::canary::false
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::success_rate::-1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_active::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_connect_fail::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::cx_total::2990
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_active::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_error::2
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_success::16921
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_timeout::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::rq_total::16923
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::health_flags::healthy
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::weight::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::region::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::sub_zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::canary::false
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.25.4:9000::success_rate::-1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_active::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_connect_fail::16933
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::cx_total::37860
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_active::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_error::16945
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_success::138003
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_timeout::0
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::rq_total::154948
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::health_flags::healthy
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::weight::1
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::region::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::sub_zone::
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::canary::false
outbound|9000||cloudspidergateway.cloudspider.svc.cluster.local::10.244.7.51:9000::success_rate::-1
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_connections::1024
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_pending_requests::1024
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_requests::1024
10outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::default_priority::max_retries::3
0  25outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_connections::1024
1k    0 outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_pending_requests::1024
 251k    0 outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_requests::1024
    0  77.9outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::high_priority::max_retries::3
M      0outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::added_via_api::true
 --:--outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_active::0
:-- --outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_connect_fail::0
:--:-- outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::cx_total::0
--:--:-outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_active::0
- 81.9outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_error::0
M
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_success::0
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_timeout::0
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::rq_total::0
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::health_flags::healthy
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::weight::1
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::region::
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::zone::
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::sub_zone::
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::canary::false
outbound|9000|v1|cloudspidergateway.cloudspider.svc.cluster.local::10.244.14.34:9000::success_rate::-1


@duderino duderino assigned duderino and unassigned costinm Feb 11, 2019
@duderino
Copy link
Contributor

I'll try this one, but it won't make it into 1.1, sorry

@duderino
Copy link
Contributor

@liutanrong thanks for the tip

@duderino duderino assigned howardjohn and unassigned duderino Feb 28, 2019
@duderino
Copy link
Contributor

@howardjohn please do what you can here. Not sure if we can get a fix into 1.1, but please give it a shot

@duderino
Copy link
Contributor

duderino commented Mar 5, 2019

This bug has become a collector for many problems that all end up looking similar to the downstream client because of the 503s. I'm going to close this since I think we have a large batch of 503 mitigations going into 1.1. Most are already in 1.1rc2, but a few more will go into 1.1rc3 which we will create later this week.

If you still have issues with 1.1rc3+ please file a fresh bug.

@duderino duderino closed this as completed Mar 5, 2019
@seanclerkin
Copy link

I'm seeing this on 1.1rc2. After a rolling release of a service, the Ingress returns 503s until a restart of Pilot.

@franpog859
Copy link

Is there any follow up issue, @seanclerkin , @duderino ? I would like to track the work on it 😉

@jaygorrell
Copy link
Contributor

Seeing upstream 503 issues still on 1.1.2.

@howardjohn
Copy link
Member

If you are seeing issues please open a new issue with details @jaygorrell

@AL-Cui
Copy link

AL-Cui commented Apr 27, 2019

Similar behaviour with 1.0.5. Restarting the pilot POD helps.

Similar behaviour with 1.0.5. But, How to restarting the pilot POD.I delete the pilot pod,but it Cannot start automatically。how to restarting the pilot。

@baifan
Copy link

baifan commented May 4, 2019

Similar behaviour with 1.1.4.

[2019-05-04 08:06:37.571][19][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-04 08:06:37.571][19][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-04 08:06:37.571][19][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-04 08:06:37.575][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream
[2019-05-04 08:06:37.575][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
[2019-05-04 08:06:38.105][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-05-04T08:06:38.858463Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-04 08:06:40.372][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-05-04T08:06:40.857978Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:42.858037Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-04 08:06:43.489][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-05-04T08:06:44.860023Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:46.858293Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:48.858084Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:50.858129Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:52.858128Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-04 08:06:53.675][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-05-04T08:06:54.857942Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:56.858256Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:06:58.858245Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:07:00.858143Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-04 08:07:02.448][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure
2019-05-04T08:07:02.858391Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:07:04.858327Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-05-04T08:07:06.858297Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected

@ghost
Copy link

ghost commented May 11, 2019

Version: 1.1.15

I am encountering following issue that stops my application and my pod is not able to launch.
I found the following error in the stack drive and the subsequent line says, terminating task executor.

[warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 13,
[warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure

So application never launch.

As specified in the thread I have delete istio-pilot but the error persists.

@jamierobert
Copy link

jamierobert commented May 16, 2019

This solved my issue,

kubectl delete meshpolicy default

This does carry security implications and I assume a new meshpolicy will have to be defined - but it does stop the 503's

@wdrdres3qew5ts21
Copy link

I'm also facing the same problem too sometime after upgrade service version my workload are calling to another Kubernetes Service that none exist (namespace not found) and in Kiali it depict the picture service call out to PassthroughCluster ! (even though it internal service in the same cluster)
I dont know what is the root cause but from what I see is "envoy proxy calling old pod's ip that cause Service not found !
What is the root cause of this problem ? I dont really understand but in my opinion it may cause Pilot "not really sync to all envoy" that cause Envoy Proxy persist route to old pod ip even though it didnt exist !

@rlenglet
Copy link
Contributor

rlenglet commented Aug 5, 2020

@wdrdres3qew5ts21 please file a separate issue. It's very likely to be completely unrelated to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests